Opened 9 years ago

Closed 6 years ago

# improve prime_pi (speedup + small fixes)

Reported by: Owned by: ohanar was major sage-6.3 number theory primes, prime counting, prime_pi sd40.5 was, leif, kevin.stueve, victor, kcrisman R. Andrew Ohana Yann Laigle-Chapuy, Leif Leonhardy, William Stein, Karl-Dieter Crisman N/A e3e33e6 (Commits) e3e33e62cc19bf93b5236e7d1177482bd52b8a9e #15138

I have rewritten the prime_pi method from scratch, it is much cleaner and less hacky this time (as I have learned more about coding), however it is still based on the same algorithm as before (no LMO yet, although I intend to take a stab at it later this year). This was developed in parallel with a method to count primes in residue classes (ticket forthcoming).

The overhead for small input >= 2 is rather large in the current prime_pi, this has been dramatically improved:

New:

sage: timeit('prime_pi(100)')
625 loops, best of 3: 475 ns per loop
sage: timeit('prime_pi(10000)')
625 loops, best of 3: 483 ns per loop


Old:

sage: timeit('prime_pi(100)')
625 loops, best of 3: 1.71 µs per loop
sage: timeit('prime_pi(10000)')
625 loops, best of 3: 5.56 µs per loop


There is also a ~5x speedup (conservatively) for larger input:

New:

sage: timeit('prime_pi(10**8)')
625 loops, best of 3: 530 µs per loop
sage: timeit('prime_pi(10**10)')
5 loops, best of 3: 27.1 ms per loop
sage: time prime_pi(10**12)
37607912018
Time: CPU 1.53 s, Wall: 1.53 s


Old:

sage: timeit('prime_pi(10**8)')
125 loops, best of 3: 3.66 ms per loop
sage: timeit('prime_pi(10**10)')
5 loops, best of 3: 161 ms per loop
sage: time prime_pi(10**12)
37607912018
Time: CPU 8.65 s, Wall: 8.64 s


The user can also give a second argument to use additional memory for a potential speedup. All primes less than the second argument are used in computing pi(x). For example:

sage: time prime_pi(10**12, 10**8)
37607912018
Time: CPU 0.92 s, Wall: 0.92 s
sage: time prime_pi(10**12, 10**8)    # pari's prime table is now cached
37607912018
Time: CPU 0.58 s, Wall: 0.58 s


An earlier version of this implementation started to fail to give correct output on 64-bit systems somewhere between 9.5*10**15 and 9.75*10**15, for issues that I had previously been unable to determine, so I had initially limited input to be < 2**49, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.

A lot of the speed of this algorithm relies on the __cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect.

The other big contributor to the speed is the __smallPi table, which stores the values of pi(x) for x < 2**16. This is used in the same manner as __cached_count, but is of course much faster.

Finally, I have introduced the legendres_formula method, which takes two arguments, x and a, and returns the number of positive integers not larger than x that are not divisible by any -- i.e. co-prime to each -- of the first a primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.

Apply trac_11475mk2.patch and trac_11475-reviewer.patch to the Sage library.

### Changed 9 years ago by ohanar

based on sage-4.7

### comment:1 Changed 9 years ago by ohanar

• Description modified (diff)
• Status changed from new to needs_review

### comment:2 follow-up: ↓ 5 Changed 9 years ago by leif

Haven't yet looked a the rest...

### comment:3 Changed 9 years ago by leif

• Authors changed from ohanar to R. Andrew Ohana

### comment:4 follow-up: ↓ 6 Changed 9 years ago by ylchapuy

A lot of the speed of this algorithm relies on the cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes)

Something like this should give you a 10% speedup for big inputs.

        if p < 200:
pos = p>>2
size = 5
else:
size = self.__numPrimes>>1
# Use the expected density of primes for expected inputs to make an
# educated guess
if p>>3 < size:
size = p>>3
pos = size


(beware to make sure there are enough primes cached for small inputs)

### comment:5 in reply to: ↑ 2 Changed 9 years ago by leif

(I wouldn't call it 'main' btw. ;-) )

Oops, s/call/name/. :)

### comment:6 in reply to: ↑ 4 Changed 9 years ago by ohanar

Sorry about that, I was planning on moving the folder, but I forgot to. I'll fix that tonight.

Something like this should give you a 10% speedup for big inputs.

Thanks, I'll try it out tonight, and update the description and patch if it proves fruitful.

### comment:7 Changed 9 years ago by ohanar

• Description modified (diff)

### Changed 9 years ago by ohanar

Makes use of Yann's suggested improvement.

### comment:8 follow-up: ↓ 9 Changed 9 years ago by leif

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?

(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)

### comment:9 in reply to: ↑ 8 Changed 9 years ago by ohanar

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

I did most of my testing on mod.math and sage.math, which is ubuntu linux (64-bit). I don't have failing results any longer, but I can tell you that I discovered the issue with 10**16, and I also know that it fails for 9.75*10**15. Give me 7-8 hours and I'll get you the output for those inputs.

While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?

I don't know how the new one behaves on 32-bit systems, since I have done literally no testing. I hope it does better than the old one, since I used stdint this time around (and the only real difference between my 64-bit vs 32-bit implementation is the use of pointers in the later).

(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)

### comment:10 follow-up: ↓ 11 Changed 9 years ago by robertwb

Apply only trac_11475v2.patch

### comment:11 in reply to: ↑ 10 Changed 9 years ago by leif

• Description modified (diff)

Apply only trac_11475v2.patch

:D

### comment:12 Changed 9 years ago by leif

• Description modified (diff)

### Changed 9 years ago by ohanar

Fixes doctests, and changed cached_primes again to improve speed

### comment:13 follow-up: ↓ 32 Changed 9 years ago by robertwb

Apply only trac_11475v3.patch

### comment:14 Changed 9 years ago by ohanar

• Description modified (diff)

### comment:16 follow-up: ↓ 17 Changed 9 years ago by ohanar

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

sage: time prime_pi(9.75*10**15)
272450165623660
Time: CPU 25614.37 s, Wall: 25614.47 s


The correct value is 272450167482953 (http://www.ieeta.pt/~tos/primes.html). I also computed prime_pi(10**16) but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).

### comment:17 in reply to: ↑ 16 Changed 9 years ago by leif

Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?

sage: time prime_pi(9.75*10**15)
272450165623660
Time: CPU 25614.37 s, Wall: 25614.47 s


The correct value is 272450167482953 (http://www.ieeta.pt/~tos/primes.html). I also computed prime_pi(10**16) but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).

Never mind. I thought you had collected a bunch of false results during development. If (apparently) all platforms are affected, it should be easier to track this down - rather than flaws due to e.g. Apple "specifics"... ;-)

Btw, I have a large collection of correct prime counts here, so if eventually the sysloads drop, I'll by myself check some results against your new implementation.

Anyway, anybody feel free to compute more values above our current limit and post them here, whether correct or not. :-)

### Changed 9 years ago by ohanar

Some major speed improvements using ~4kB of tables

### comment:18 Changed 9 years ago by ohanar

• Description modified (diff)

### comment:19 Changed 9 years ago by ohanar

An update:

v4: added a table of prime_pi values that can be stored in a byte -- using this with the __cached_primes method gave a giant speedup. Since the sum at the beginning of __phi (and __phi32) differs from 16*(x/77) by a small factor that only depends on x%2310, so adding a table with these factors results in another large speedup. These tables take a total of ~4kB.

v5: the tables introduced a significant initialization time for prime_pi, and William asked me to fix this, so prime_pi (and legendres_formula) only initialize these tables on an initial call -- so initialization time is now ~650ns:

sage: timeit('sage.functions.prime_pi.PrimePi()')
625 loops, best of 3: 627 ns per loop


Both Yann and I have tested 9.75*10^15 again and got the correct result, I also did 10^16. We both ran it on 64-bit GNU/Linux systems. So the issue I was encountering may have been due to boundary issues with the __cached_primes method (which have been rewritten over the course of the patches). I kept the limit in since I haven't yet tested this thoroughly.

### Changed 9 years ago by ohanar

Reduced initialization time

### comment:20 Changed 9 years ago by ohanar

• Description modified (diff)

### comment:21 follow-up: ↓ 22 Changed 9 years ago by leif

I've so far only tested a little (on 64-bit Linux; dead old 32-bit box still building Sage), but already found one wrong result with patch v3:

  502000000000000  15296923904889   # 11475v3
502000000000000  15296922045596   # correct result, (at least) double-checked


Note that 502*1012 is below 249 (562.949.953.421.312).

As my machines are still busy, you could perhaps check if the newer version(s) give the correct result.

### comment:22 in reply to: ↑ 21 ; follow-up: ↓ 23 Changed 9 years ago by ohanar

Works on sage.math with 11475v5:

sage: time prime_pi(502000000000000)
15296922045596
Time: CPU 723.40 s, Wall: 723.43 s


### comment:23 in reply to: ↑ 22 Changed 9 years ago by leif

Works on sage.math with 11475v5:

sage: time prime_pi(502000000000000)
15296922045596
Time: CPU 723.40 s, Wall: 723.43 s


Yep, here too; with both v4 and v5 (64-bit Linux).

### comment:24 follow-up: ↓ 25 Changed 9 years ago by leif

  580000000000000  17596211452162
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
581000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
582000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
583000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
584000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
585000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
586000000000000               0


Hmm, really?

### comment:25 in reply to: ↑ 24 Changed 9 years ago by ohanar

  580000000000000  17596211452162
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
581000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
582000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
583000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
584000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
585000000000000               0
^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored
586000000000000               0


Hmm, really?

Oops, thankfully easily fixed. Shouldn't occur in the next patch (whenever that is).

### comment:26 follow-up: ↓ 27 Changed 9 years ago by leif

• Status changed from needs_review to needs_work

Ok, here are my first 32-bit tests (with patch v5):

sage -t  "devel/sage-11475v5/sage/functions/prime_pi.pyx"
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126:
sage: prime_pi(10**10)
Expected:
455052511
Got:
498132213
**********************************************************************
1 of  14 in __main__.example_1
***Test Failed*** 1 failures.
For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py
[6.1 s]

----------------------------------------------------------------------
The following tests failed:

sage -t  "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Total time for all tests: 6.1 seconds

real    0m6.293s
user    0m5.328s
sys     0m0.760s


With long tests included, some unhandled overflows occur:

sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126:
sage: prime_pi(10**10)
Expected:
455052511
Got:
498132213
**********************************************************************
File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 231:
sage: for n in (42,41..32): prime_pi(2**n) # long time (21s on mod.math, 2011)
Expected:
156661034233
80316571436
41203088796
21151907950
10866266172
5586502348
2874398515
1480206279
762939111
393615806
203280221
Got:
2072485352
555689055
4249350341
1916569322
829105263
2484680881
3345496393
1649125207
813719441
403824770
203621732
**********************************************************************
1 of  14 in __main__.example_1
1 of  14 in __main__.example_4
***Test Failed*** 2 failures.
For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py
[52.1 s]

----------------------------------------------------------------------
The following tests failed:

sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx"
Total time for all tests: 52.2 seconds

real    0m52.318s
user    0m49.923s
sys     0m0.956s


This is on Ubuntu 7.10 :) (P4 Northwood, gcc 4.2.1) with Sage 4.7.

### comment:27 in reply to: ↑ 26 ; follow-up: ↓ 28 Changed 9 years ago by ohanar

Unfortunate, but kind of expected. It'll probably take a day or two for me to set up a 32-bit vm to try to debug the issue (it isn't blatantly obvious to me what is going wrong).

### comment:28 in reply to: ↑ 27 Changed 9 years ago by leif

(it isn't blatantly obvious to me what is going wrong).

Yep, especially the "long int doesn't fit into int" reminds me of 16-bit systems... 8/

### Changed 9 years ago by ohanar

fixes issue on 32-bit systems

### comment:29 Changed 9 years ago by ohanar

• Description modified (diff)
• Status changed from needs_work to needs_review

### comment:30 follow-up: ↓ 31 Changed 9 years ago by ohanar

Hopefully that was the only issue with 32-bit systems. Basically gmp wants a 32-bit int for mpz_set_ui on 32-bit systems, so I gave it what it wanted (this was also causing the issue with the __sieve function, since I use the mpz_sqrt function to calculate the argument).

I am currently doing some more thorough testing with large values on a 64-bit machine, so that we can be sure there isn't any more breakage on the stable platform.

### comment:31 in reply to: ↑ 30 Changed 9 years ago by leif

Hopefully that was the only issue with 32-bit systems.

v6 passes long tests on the 32-bit Ubuntu 7.10 machine (in about a minute). More to come.

### comment:32 in reply to: ↑ 13 Changed 9 years ago by ohanar

Apply only trac_11475v6.patch

### comment:33 follow-up: ↓ 35 Changed 9 years ago by ohanar

• Status changed from needs_review to needs_work

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :). I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.

Unfortunately, I have also come across an issue with solaris 10 sparc (t2.math). The root of the cause is that mpz_export has the bit ordering arguments flipped. I am not aware of another way to get unsigned ints out of gmp integers, nor am I aware of a way to check the platform from cython. Also, I don't know if other platforms have this same issue (solaris86 or mac powerpc in particular), so if someone could test that, please let me know.

### Changed 9 years ago by ohanar

100% doctest coverage + slightly faster init + solaris/sparc fix

### comment:34 Changed 9 years ago by ohanar

• Status changed from needs_work to needs_review

Apply only trac_11475v8.patch

### comment:35 in reply to: ↑ 33 ; follow-ups: ↓ 36 ↓ 37 Changed 9 years ago by leif

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :).

Besides a couple of other values on individual machines, I've successfully computed pi(2k) for k=0...53 and pi(10k) for k=0...16 on each of three different machines (two of them 32-bit; all with patch version 6).

Some computations are still in progress, but so far I could also verify the results of prime_pi(2**54) and prime_pi(2**55) (on at least one machine). I've also played a little with "legendres_formula()".

I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.

Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...

I wonder if legendres_formula() shouldn't be renamed to legendre_phi() - better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, and legendre_phi would be analogous to e.g. euler_phi in Sage. (I know Wolfram and others call "it" - i.e., IMHO the method computing the respective function rather than the function itself - Legendre's Formula as well, but it's not that unambiguous.)

I'd also mention in its docstring it's called the partial sieve function.

### comment:36 in reply to: ↑ 35 Changed 9 years ago by leif

I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.

Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...

Ah, I see, you already removed the restriction in patch v8 (and also added warnings w.r.t. seemingly "endless" computations, which I also wanted to suggest).

There's a small typo in __call__()'s docstring (the 2**32 should read 2**64):

        This implementation uses unsigned 64-bit ints and thus does not
support x >= 2**32::

sage: prime_pi(2^64)
Traceback (most recent call last):
...
NotImplementedError: computation of prime_pi for x >= 2^64 is not implemented


Also, there's a mixture of both kinds of exponentiation operators (^ and **) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.

### comment:37 in reply to: ↑ 35 ; follow-up: ↓ 38 Changed 9 years ago by ohanar

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :).

Besides a couple of other values on individual machines,

Which machines have you had issues with? Everything I have tested with v8 has worked (this includes x86 and x86_64 linux, 64bit Snow Leopard, and Sparc Solaris 10).

I wonder if legendres_formula() shouldn't be renamed to legendre_phi() - better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, and legendre_phi would be analogous to e.g. euler_phi in Sage.

Makes sense to me, if no one has an objection to the idea, I'll make the change.

I'd also mention in its docstring it's called the partial sieve function.

Sounds good to me.

Also, there's a mixture of both kinds of exponentiation operators (^ and **) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.

Will do -- probably ^ since it is more consumer friendly.

### comment:38 in reply to: ↑ 37 Changed 9 years ago by leif

I have verified that prime_pi gives correct output for i*10**12 with 0 <= i <= 1000 on x86_64 :).

Besides a couple of other values on individual machines,

Which machines have you had issues with?

Oh, I didn't get any wrong results; I just meant I've also tested prime_pi(x) with some "random" values (besides the mentioned powers of 2 and ten), but not systematically on all machines (all Linux boxes; Pentium4 Northwood, P4 Prescott and Core2).

### comment:39 Changed 9 years ago by leif

• Description modified (diff)

Computation of prime_pi(2**56) just finished with the correct result on one of the 32-bit machines :-) (took 2 days+).

[Updated description w.r.t. (now obsolete) limitation / potentially wrong results for inputs exceeding 249 with earlier versions.]

See comment #40

### comment:40 Changed 9 years ago by ohanar

So through the updates, I haven't been keeping legendres_formula up to date. I am posting a patch that should fix the bugs that were present for the past few versions (primarily, calling it with a relatively small x in comparison to a would give an extremely wrong answer due to overflow issues). I have also changed the name, so it now is called either legendre_phi or partial_sieve_function.

The patch also makes all documentation use ^, there are no longer any instances of **.

The patch also touches up on __phi and __phi32 -- cleaning up code, adding a few comments, and makes a simple optimization to reduce boolean checks for small x values (~10% speedup for large input).

I'm hoping this will be the final patch for this ticket, so that I don't go into the double digits for version numbers :-).

Apply only trac_11475v9.patch

### comment:41 Changed 9 years ago by leif

• Description modified (diff)

The new version comes in the right moment as the slow 32-bit machine just finished computing prime_pi(2^57) (successfully; still with v6 though). No separate timings, but pi(254) ... pi(257) took 7.5 days in total. :-)

At first glance, there are only a few typos:

• There's a superfluous + in the computation of self.__tabS[i] (continuation line).
• The 249 remains renitent:
    cdef uint64_t __pi(self, uint64_t x) except -1:
r"""
Returns :math:\pi(x) and assumes :math:self.__maxSieve < x < 2^{49}
"""

• s/number of primes/number of positive integers/:
    cdef uint64_t __phi(self, uint64_t x, uint64_t i):
r"""
Legendre's formula: returns the number of primes :math:\leq x
that are not divisible by the first i primes
"""


### comment:42 Changed 9 years ago by leif

• Reviewers set to Yann Laigle-Chapuy, Leif Leonhardy

So far passed all [long] tests on Ubuntu 7.10 x86 (Sage 4.7) and Ubuntu 9.04 x86_64 (Sage 4.7.1.alpha3; patch to module_list.py applies with 1 line offset).

That's of course a bit funny (I don't mind):

sage: legendre_phi(2^63,0)
computation of legendre_phi for large x can take minutes, hours, or days depending on the size of x
computation of legendre_phi for x >= 10^15 has not been thoroughly tested, be cautious of the result
9223372036854775808
sage:


We may also add "or weeks", but the current version is faster than the previous ones and much faster than the current version in Sage of course (not to mention its odd limitation).

There are opportunities for further improvements (e.g. copying the whole list of primes when we need more; perhaps some more comments / documentation), but these can IMHO be left to follow-up tickets.

In principle positive review from me (minus the typos), just hesitating to give it at this point without having studied the code more thoroughly.

### comment:43 follow-ups: ↓ 44 ↓ 46 Changed 9 years ago by leif

Another corner case:

sage: legendre_phi(1000,10^10)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)

/home/leif/Sage/sage-4.7.1.alpha3/devel/sage-11475v9/<ipython console> in <module>()

/home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5888)()

/home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5517)()

NotImplementedError: computing legendre_phi for a > prime_pi(2^32-1) not implemented
sage:


(The message is a bit misleading by the way, since we limit x to 64 bits, and therefore the result is independent of a >= 2^32. If at all, I'd also give the numerical upper limit for a, i.e. 203280221. But as I said, there isn't really any limit on a.)

Also, don't try this at home:

sage: time legendre_phi(1000,10^8)


(I fortunately could kill the process before the almost freezed machine went out of swap space. Apparently computing the list of primes is uninterruptable, which worsens the situation.)

### comment:44 in reply to: ↑ 43 Changed 9 years ago by leif

The message is a bit misleading by the way, since we limit x to 64 bits, and therefore the result is independent of a >= 2^32.

More precisely: independent of a >= prime_pi(2^32). ;-)

### comment:45 Changed 9 years ago by leif

I'd also use int_fast8_t and uint_fast8_t instead of int8_t and uint8_t, since size doesn't really matter there:

• ## sage/functions/prime_pi.pyx

diff --git a/sage/functions/prime_pi.pyx b/sage/functions/prime_pi.pyx
 a include '../ext/stdsage.pxi' include '../ext/interrupt.pxi' from libc.stdint cimport int8_t, uint8_t, uint32_t, uint64_t from libc.stdint cimport int_fast8_t, uint_fast8_t, uint32_t, uint64_t from sage.rings.integer cimport Integer from sage.rings.real_mpfr import RealLiteral, create_RealNumber, RR from sage.rings.arith import nth_prime cdef uint32_t __numPrimes cdef uint32_t __maxSieve cdef uint64_t __maxSieve64 cdef int8_t[2310] __tabS cdef uint8_t[1619] __smallPi cdef int_fast8_t[2310] __tabS cdef uint_fast8_t[1619] __smallPi def __dealloc__(self): if self.__numPrimes:

### comment:46 in reply to: ↑ 43 ; follow-up: ↓ 47 Changed 9 years ago by ohanar

Another corner case:

sage: legendre_phi(1000,10^10)
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)

/home/leif/Sage/sage-4.7.1.alpha3/devel/sage-11475v9/<ipython console> in <module>()

/home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5888)()

/home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5517)()

NotImplementedError: computing legendre_phi for a > prime_pi(2^32-1) not implemented
sage:


(The message is a bit misleading by the way, since we limit x to 64 bits, and therefore the result is independent of a >= 2^32. If at all, I'd also give the numerical upper limit for a, i.e. 203280221. But as I said, there isn't really any limit on a.)

Of course, I wrote the method as a wrapper for phi, so I was mainly thinking of what I needed to call that method. I can easily remove this bound.

Also, don't try this at home:

sage: time legendre_phi(1000,10^8)


(I fortunately could kill the process before the almost freezed machine went out of swap space. Apparently computing the list of primes is uninterruptable, which worsens the situation.)

When fixing the above issue, I can fix this. This same issue will appear for very large x for prime_pi, this can be blamed on PARI -- it stores the primes inefficiently and will use ~17GB for a sieve up to 2**32-1 (my code should use ~1GB at this size). I can implement a custom sieve if it this poses much of an issue (eventually though something needs to be done to fix prime_range).

### comment:47 in reply to: ↑ 46 ; follow-ups: ↓ 48 ↓ 49 Changed 9 years ago by leif

(I fortunately could kill the process before the almost freezed machine went out of swap space. Apparently computing the list of primes is uninterruptable, which worsens the situation.)

When fixing the above issue, I can fix this. This same issue will appear for very large x for prime_pi, this can be blamed on PARI -- it stores the primes inefficiently and will use ~17GB for a sieve up to 2**32-1.

Ouch. (Or should it read 1.7 GB, with ~8 bytes per prime? Otherwise there would be a real hardware limit - max. 2 or 4 GB address space - on 32-bit machines and OSs I haven't yet met during the tests.)

I thought the huge memory consumption was mostly the fault of prime_range() (and Python), and to some extent of having up to three or four lists or arrays of primes at the same time (PARI + Python + perhaps two Cython arrays while copying).

Would using Python ints (py_ints=True parameter to prime_range()) alleviate the situation in any way?

One could also request the primes in chunks rather than the whole interval at once, or implement some generator class or function that calls PARI.

Using sage_realloc() would also reduce the (peak) physical memory requirement.

I can implement a custom sieve if this poses much of an issue (eventually though something needs to be done to fix prime_range).

Hmmm. I thought we could avoid using prime_range() later, on a follow-up, e.g. by directly accessing PARI from Cython. But feel free to implement or use some other sieve / function.

At least potentially freezing the machine (which IMHO is an OS problem) isn't very user-friendly, especially in educational software I think...

### comment:48 in reply to: ↑ 47 Changed 9 years ago by leif

This same issue will appear for very large x for prime_pi, this can be blamed on PARI -- it stores the primes inefficiently and will use ~17GB for a sieve up to 2**32-1.

That might have been the case with older PARI versions; if I do

sage: time pari.init_primes(2^32)
Time: CPU 13.42 s, Wall: 18.47 s


the whole process just occupies about 283 MB (89 MB thereof basic Sage). And yes, all have been computed, as one can "verify" with e.g. the time it takes to get some primes around the current "end":

sage: prime_range(2^32,2^32+100) # immediate answer if init_primes(2^32) has been called in advance
[4294967311, 4294967357, 4294967371, 4294967377, 4294967387, 4294967389]


The fatal things are commands like this:

sage: prime_range(2^30)[-1]


which turns the relative prime offsets stored by PARI (hardly more than one byte per prime on average) into a list of boxed integers.

### comment:49 in reply to: ↑ 47 Changed 9 years ago by ohanar

Using sage_realloc() would also reduce the (peak) physical memory requirement.

How long has sage had sage_realloc? I could have sworn that it didn't exist a year ago.

I can implement a custom sieve if this poses much of an issue (eventually though something needs to be done to fix prime_range).

Hmmm. I thought we could avoid using prime_range() later, on a follow-up, e.g. by directly accessing PARI from Cython. But feel free to implement or use some other sieve / function.

I have am working on v10, which will primarily change the __sieve function to directly access PARI and use sage_realloc. I have this working on 64-bit linux right now:

sage: get_memory_usage()
844.1484375
sage: time legendre_phi(1000,10^8)
1
Time: CPU 22.55 s, Wall: 22.54 s
sage: get_memory_usage()
1406.78515625


Obviously I haven't made all the changes that I need to with legendre_phi, however it removes the potentially insane memory issues with using prime_range. Worst case scenario for storage now:

sage: time legendre_phi(1000, 203280221)
1
Time: CPU 46.85 s, Wall: 46.86 s
sage: get_memory_usage()
1975.58203125


### comment:50 Changed 9 years ago by leif

Just for the record:

The second 32-bit machine just successfully computed prime_pi(2^58) (still patch version 6), which took about ten days on an already fully loaded system.

### Changed 9 years ago by ohanar

rewrote sieve (now goes directly through pari) + some significant changes to legendre_phi

### comment:51 Changed 9 years ago by ohanar

Been busy, but I finally managed to get another patch (after running into quite a number of issues). This has some pretty significant changes -- specifically to legendre_phi and __sieve.

legendre_phi now has keyboard interrupt support :). Also, it removes the limit to a as well as make some other improvements to speed for various special cases.

__sieve was rewritten to remove the use of prime_range and now works directly through PARI. Also sage_realloc was added (I wasn't aware that sage had realloc, I could have sworn that I tried using it before with no luck).

Apply only trac_11475v10.patch

### comment:52 Changed 9 years ago by leif

• Description modified (diff)

### Changed 9 years ago by leif

Sage library patch. Apply on top of Andrew's v10. (Actually also based on Sage 4.7.1.alpha4.)

### comment:53 Changed 9 years ago by leif

• Description modified (diff)

I've added a reviewer patch to Andrew's version 10:

• PARI's diffptr is uint8_t* (actually unsigned char* or byteptr); using uint_fast8_t* there would fail on any machine on which sizeof(uint8_t) != sizeof(uint_fast8_t).
• Use the more portable ptrdiff_t for "__diffptrOff" (which will then also be smaller on 32-bit machines).
• Change order of libraries in module_list.py, as PARI also depends on GMP, to support dumber linkers (necessary for at least Cygwin I guess).
• stdint.h is C99, so add the corresponding compiler flag.

The superfluous + is still in, missed to fix that in my reviewer patch.

I haven't yet really looked at the rest of the new code (i.e., v10), only performed the standard tests (-long) on both Ubuntu 9.04 x86 and x86_64 (Sage 4.7 and Sage 4.7.1.alpha4, respectively), which passed (seemingly slightly faster, but I may be wrong ;-) ).

The module_list.py part of Andrew's v10 patch applies to Sage 4.7.1.alpha4, too, but with an offset of -26 lines, so should perhaps be rebased.

(Unfortunately my second 32-bit machine apparently died, on which I mainly tested prime_pi() for larger values previously.)

### comment:54 Changed 9 years ago by ohanar

• Description modified (diff)

So how the prime list is created and managed was bothering me. Specifically, its initiation code uses a bit of a hack to figure out whether or not it was called by init_tables and also uses pari(n).primepi(), which without #11741 can be quite buggy for larger n; it can consume a rather large amount of memory if the user gives a large input; and the user does not have any option on how big it is.

So I've completely rewritten the initiation method, it now recursively calls the pi method to determine the length of the prime list. I've also completely I've removed the caching part of the code, since we can quickly copy the primes over from pari's list, and instead added an optional second argument that allows the user to use additional memory if so desired. I've also tweaked the size of the smallPi list to give another significant speedup (roughly 50% faster than the last patch), as well as made a couple of simplifications in phi32.

Beyond those changes, I've refactored a lot of code, replaced cython code with python for the non-critical sections in order to improve readability, reverted the plot method to something very close to its original state (since we are no longer caching), and folded in Leif's reviewer patch. The code is now 552 lines versus the previous 625.

### comment:55 Changed 9 years ago by davidloeffler

Apply trac_11475mk2.patch

(for patchbot)

### comment:56 Changed 9 years ago by davidloeffler

• Status changed from needs_review to needs_work

According to the patchbot logs there are some issues with documentation formatting:

docstring of sage.functions.prime_pi.legendre_phi:39: WARNING: Duplicate explicit target name: "rao2011".
docstring of sage.functions.prime_pi.partial_sieve_function:39: WARNING: Duplicate explicit target name: "rao2011".
docstring of sage.functions.prime_pi.legendre_phi:39: WARNING: duplicate citation RAO2011, other instance in /storage/masiao/sage-5.0.beta8/devel/sage/doc/en/reference/sage/functions/prime_pi.rst
docstring of sage.functions.prime_pi.partial_sieve_function:39: WARNING: duplicate citation RAO2011, other instance in /storage/masiao/sage-5.0.beta8/devel/sage/doc/en/reference/sage/functions/prime_pi.rst


This is basically because there should be only one copy of the bibliography entry in the file, rather than having it afresh in every docstring that cites it.

Also, shouldn't the class PrimePi derive from sage.symbolic.function.BuiltinFunction?, rather from the base Object class?

### comment:57 Changed 9 years ago by ohanar

• Status changed from needs_work to needs_review

Ok, both issues should be resolved -- it should now work properly in the symbolic ring.

### comment:58 Changed 9 years ago by ohanar

Apply trac_11475mk2.patch

### comment:59 Changed 8 years ago by was

• Status changed from needs_review to positive_review

works good to me.

### comment:60 Changed 8 years ago by kcrisman

On the plus side, this fixes some of #6876. On the down side:

WARNING: we draw the plot of prime_pi as a stairstep function with
explicitly drawn vertical lines where the function jumps. Technically
Technically there should not be any vertical lines, but they make the
graph look much better, so we include them. Use the option
vertical_lines=False to turn these off.


I'm adding a fix for the latter, and a doctest for the former. Shouldn't affect anything.

### comment:61 Changed 8 years ago by kcrisman

• Reviewers changed from Yann Laigle-Chapuy, Leif Leonhardy to Yann Laigle-Chapuy, Leif Leonhardy, William Stein, Karl-Dieter Crisman

### comment:62 Changed 8 years ago by kcrisman

• Description modified (diff)

Patchbot: Apply trac_11475mk2.patch and trac_11475-reviewer.patch.

### comment:63 Changed 8 years ago by kcrisman

William pointed out that this didn't fix #6876 quite how we want, so just fixing the typo.

### comment:64 Changed 8 years ago by jdemeyer

• Status changed from positive_review to needs_work

This fails on arando (Linux Ubuntu 12.04 i686, 32 bit):

sage -t  --long -force_lib devel/sage/sage/functions/prime_pi.pyx
**********************************************************************
File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.1.beta4/devel/sage-main/sage/functions/prime_pi.pyx", line 123:
sage: prime_pi(10^10)
Expected:
455052511
Got:
70462980
**********************************************************************
File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.1.beta4/devel/sage-main/sage/functions/prime_pi.pyx", line 231:
sage: for i in (32..42): prime_pi(2^i) # long time (13s on sage.math, 2011)
Expected:
203280221
393615806
762939111
1480206279
2874398515
5586502348
10866266172
21151907950
41203088796
80316571436
156661034233
Got:
0
0
0
0
0
0
0
0
0
0
0
**********************************************************************


### comment:65 Changed 8 years ago by jdemeyer

On hawk (OpenSolaris i386), it actually Segmentation Faults.

### comment:66 Changed 8 years ago by ohanar

• Status changed from needs_work to needs_review

### comment:67 Changed 8 years ago by kini

patchbot: apply trac_11475mk2.patch trac_11475-reviewer.patch

### comment:68 Changed 8 years ago by kini

patchbot: apply trac_11475mk2.patch trac_11475-reviewer.patch

### comment:69 Changed 8 years ago by kini

• Status changed from needs_review to positive_review

Jeroen: now it passes on arando. Can you check on hawk?

### comment:70 Changed 8 years ago by jdemeyer

• Milestone changed from sage-5.1 to sage-5.2

### comment:71 Changed 8 years ago by jdemeyer

• Merged in set to sage-5.2.beta1
• Resolution set to fixed
• Status changed from positive_review to closed

### comment:72 Changed 8 years ago by vbraun

Why does this ticket make random whitespace changes to sage/functions/all.py? This is total BS. This just breaks patches on trac and wastes other people's time. For example #11143 doesn't apply any more.

### comment:73 Changed 8 years ago by jdemeyer

• Merged in sage-5.2.beta1 deleted
• Milestone changed from sage-5.2 to sage-5.3
• Resolution fixed deleted
• Status changed from closed to new

On bsd.math (OS X 10.6 x86_64), I still get

sage -t  --long -force_lib devel/sage/sage/modular/abvar/torsion_subgroup.py
**********************************************************************
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/devel/sage-main/sage/modular/abvar/torsion_subgroup.py", line 316:
sage: G.multiple_of_order()
Exception raised:
Traceback (most recent call last):
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/bin/ncadoctest.py", line 1231, in run_one_test
self.run_one_example(test, example, filename, compileflags)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/bin/sagedoctest.py", line 38, in run_one_example
OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/bin/ncadoctest.py", line 1172, in run_one_example
compileflags, 1) in test.globs
File "<doctest __main__.example_9[7]>", line 1, in <module>
G.multiple_of_order()###line 316:
sage: G.multiple_of_order()
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/abvar/torsion_subgroup.py", line 375, in multiple_of_order
f = A.hecke_polynomial(p)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/abvar/abvar.py", line 2239, in hecke_polynomial
f = self._compute_hecke_polynomial(n, var=var)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/abvar/abvar.py", line 3800, in _compute_hecke_polynomial
return sqrt_poly(self.modular_symbols().hecke_polynomial(n, var))
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/module.py", line 1460, in hecke_polynomial
return self.hecke_operator(n).charpoly(var)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/hecke_operator.py", line 274, in charpoly
return self.matrix().charpoly(var)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/hecke_operator.py", line 747, in matrix
self.__matrix = self.parent().hecke_matrix(self.__n, *args, **kwds)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/algebra.py", line 589, in hecke_matrix
return self.__M.hecke_matrix(n, *args, **kwds)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/module.py", line 1351, in hecke_matrix
T = self._compute_hecke_matrix(n)
File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/submodule.py", line 239, in _compute_hecke_matrix
return A.restrict(self.free_module(), check=check)
File "matrix2.pyx", line 4202, in sage.matrix.matrix2.Matrix.restrict (sage/matrix/matrix2.c:23293)
File "element.pyx", line 2736, in sage.structure.element.Matrix.__mul__ (sage/structure/element.c:19170)
File "coerce.pyx", line 740, in sage.structure.coerce.CoercionModel_cache_maps.bin_op (sage/structure/coerce.c:6778)
File "action.pyx", line 147, in sage.matrix.action.MatrixMatrixAction._call_ (sage/matrix/action.c:3098)
File "matrix_rational_dense.pyx", line 1047, in sage.matrix.matrix_rational_dense.Matrix_rational_dense._matrix_times_matrix_ (sage/matrix/matrix_rational_dense.c:12115)
File "matrix_rational_dense.pyx", line 1090, in sage.matrix.matrix_rational_dense.Matrix_rational_dense._multiply_over_integers (sage/matrix/matrix_rational_dense.c:12248)
RuntimeError: Segmentation fault
**********************************************************************


### comment:74 follow-up: ↓ 75 Changed 8 years ago by ohanar

Do either of these cause a segfault?:

sage: prime_pi(2**15)
3512
sage: prime_pi(2**10-1)
172


Those are the only two calls made to prime_pi made by that doctest.

### comment:75 in reply to: ↑ 74 Changed 8 years ago by jdemeyer

Do either of these cause a segfault?:

sage: prime_pi(2**15)
3512
sage: prime_pi(2**10-1)
172


No.

real patch

### comment:76 Changed 8 years ago by ohanar

• Status changed from new to needs_review

Ok, OSX issue should now be fixed (all tests pass for me on bsd.math now).

### comment:77 Changed 7 years ago by jdemeyer

• Milestone changed from sage-5.11 to sage-5.12

### comment:78 Changed 7 years ago by ohanar

• Branch set to u/ohanar/prime_pi
• Dependencies set to #15138

this patch has bit-rotted, I'll upload a git branch in a moment

### comment:79 Changed 7 years ago by git

• Commit set to e3e33e62cc19bf93b5236e7d1177482bd52b8a9e

Branch pushed to git repo; I updated commit sha1. New commits:

 [changeset:e3e33e6] Trac 11475 reviewer patch - very minor prime_pi plot doc improvement [changeset:a3a0edd] fix doctest [changeset:ef57887] remove trailing \'s and other little cleanup [changeset:ed701d3] prime_pi: fix off by 1 mallocing issue [changeset:bc36b43] 5x speedup to prime_pi

### comment:80 Changed 7 years ago by vbraun_spam

• Milestone changed from sage-6.1 to sage-6.2

### comment:81 Changed 6 years ago by vbraun_spam

• Milestone changed from sage-6.2 to sage-6.3

### comment:82 Changed 6 years ago by pbruin

• Status changed from needs_review to positive_review

The patchbot experienced some glitches on this ticket, so I tested it again; all tests pass and the documentation builds. From the comments, it appears that the ticket was already closed, then reopened because of a segfault on OS X 10.6 x86_64, and that this bug was subsequently fixed. Hence let's put it back to positive review.

### comment:83 Changed 6 years ago by vbraun

• Branch changed from u/ohanar/prime_pi to e3e33e62cc19bf93b5236e7d1177482bd52b8a9e
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.