Opened 8 years ago
Closed 6 years ago
#11475 closed enhancement (fixed)
improve prime_pi (speedup + small fixes)
Reported by: | ohanar | Owned by: | was |
---|---|---|---|
Priority: | major | Milestone: | sage-6.3 |
Component: | number theory | Keywords: | primes, prime counting, prime_pi sd40.5 |
Cc: | was, leif, kevin.stueve, victor, kcrisman | Merged in: | |
Authors: | R. Andrew Ohana | Reviewers: | Yann Laigle-Chapuy, Leif Leonhardy, William Stein, Karl-Dieter Crisman |
Report Upstream: | N/A | Work issues: | |
Branch: | e3e33e6 (Commits) | Commit: | e3e33e62cc19bf93b5236e7d1177482bd52b8a9e |
Dependencies: | #15138 | Stopgaps: |
Description (last modified by )
I have rewritten the prime_pi
method from scratch, it is much cleaner and less hacky this time (as I have learned more about coding), however it is still based on the same algorithm as before (no LMO yet, although I intend to take a stab at it later this year). This was developed in parallel with a method to count primes in residue classes (ticket forthcoming).
The overhead for small input >= 2
is rather large in the current prime_pi
, this has been dramatically improved:
New:
sage: timeit('prime_pi(100)') 625 loops, best of 3: 475 ns per loop sage: timeit('prime_pi(10000)') 625 loops, best of 3: 483 ns per loop
Old:
sage: timeit('prime_pi(100)') 625 loops, best of 3: 1.71 µs per loop sage: timeit('prime_pi(10000)') 625 loops, best of 3: 5.56 µs per loop
There is also a ~5x speedup (conservatively) for larger input:
New:
sage: timeit('prime_pi(10**8)') 625 loops, best of 3: 530 µs per loop sage: timeit('prime_pi(10**10)') 5 loops, best of 3: 27.1 ms per loop sage: time prime_pi(10**12) 37607912018 Time: CPU 1.53 s, Wall: 1.53 s
Old:
sage: timeit('prime_pi(10**8)') 125 loops, best of 3: 3.66 ms per loop sage: timeit('prime_pi(10**10)') 5 loops, best of 3: 161 ms per loop sage: time prime_pi(10**12) 37607912018 Time: CPU 8.65 s, Wall: 8.64 s
The user can also give a second argument to use additional memory for a potential speedup. All primes less than the second argument are used in computing pi(x). For example:
sage: time prime_pi(10**12, 10**8) 37607912018 Time: CPU 0.92 s, Wall: 0.92 s sage: time prime_pi(10**12, 10**8) # pari's prime table is now cached 37607912018 Time: CPU 0.58 s, Wall: 0.58 s
An earlier version of this implementation started to fail to give correct output on 64-bit systems somewhere between 9.5*10**15
and 9.75*10**15
, for issues that I had previously been unable to determine, so I had initially limited input to be < 2**49
, which I am fairly confident is safe, as I have done a fair bit of testing in this range trying to debug this issue (despite it taking hours per call). I would appreciate testing on 32-bit platforms.
A lot of the speed of this algorithm relies on the __cached_count
method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes). This is the fastest type of binary search I know of, but if anyone knows of a way to speed it up, it could have a significant effect.
The other big contributor to the speed is the __smallPi
table, which stores the values of pi(x) for x < 2**16
. This is used in the same manner as __cached_count
, but is of course much faster.
Finally, I have introduced the legendres_formula
method, which takes two arguments, x
and a
, and returns the number of positive integers not larger than x
that are not divisible by any -- i.e. co-prime to each -- of the first a
primes. This function is core to all combinatorial methods for computing the prime counting function, so I figured that I would make this publicly accessible now that I have clean code.
Apply trac_11475mk2.patch and trac_11475-reviewer.patch to the Sage library.
Attachments (12)
Change History (95)
Changed 8 years ago by
comment:1 Changed 8 years ago by
- Description modified (diff)
- Status changed from new to needs_review
comment:2 follow-up: ↓ 5 Changed 8 years ago by
The link(s) to your paper is (currently) dead. (I wouldn't call it 'main' btw. ;-) )
Haven't yet looked a the rest...
comment:3 Changed 8 years ago by
comment:4 follow-up: ↓ 6 Changed 8 years ago by
A lot of the speed of this algorithm relies on the cached_count method, which is essentially a binary search (with a simple adjustment in the beginning that takes advantage of the density of the primes)
Something like this should give you a 10% speedup for big inputs.
if p < 200: pos = p>>2 size = 5 else: size = self.__numPrimes>>1 # Use the expected density of primes for expected inputs to make an # educated guess if p>>3 < size: size = p>>3 pos = size
(beware to make sure there are enough primes cached for small inputs)
comment:5 in reply to: ↑ 2 Changed 8 years ago by
comment:6 in reply to: ↑ 4 Changed 8 years ago by
Replying to leif:
The link(s) to your paper is (currently) dead. (I wouldn't call it 'main' btw. ;-) )
Sorry about that, I was planning on moving the folder, but I forgot to. I'll fix that tonight.
Replying to ylchapuy:
Something like this should give you a 10% speedup for big inputs.
Thanks, I'll try it out tonight, and update the description and patch if it proves fruitful.
comment:7 Changed 8 years ago by
- Description modified (diff)
comment:8 follow-up: ↓ 9 Changed 8 years ago by
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?
(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)
comment:9 in reply to: ↑ 8 Changed 8 years ago by
Replying to leif:
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
I did most of my testing on mod.math and sage.math, which is ubuntu linux (64-bit). I don't have failing results any longer, but I can tell you that I discovered the issue with 10**16
, and I also know that it fails for 9.75*10**15
. Give me 7-8 hours and I'll get you the output for those inputs.
While the old implementation was said to fail on (at least some) 32-bit systems for large inputs, the new one in contrast fails on 64-bit systems?
I don't know how the new one behaves on 32-bit systems, since I have done literally no testing. I hope it does better than the old one, since I used stdint this time around (and the only real difference between my 64-bit vs 32-bit implementation is the use of pointers in the later).
(More or less just curious; perhaps I'll one dayTM try to track this down, depending on the affected system(s) as well.)
comment:10 follow-up: ↓ 11 Changed 8 years ago by
Apply only trac_11475v2.patch
comment:11 in reply to: ↑ 10 Changed 8 years ago by
- Description modified (diff)
comment:12 Changed 8 years ago by
- Description modified (diff)
comment:13 follow-up: ↓ 32 Changed 8 years ago by
Apply only trac_11475v3.patch
comment:14 Changed 8 years ago by
- Description modified (diff)
comment:15 Changed 8 years ago by
- Cc kcrisman added
comment:16 follow-up: ↓ 17 Changed 8 years ago by
Replying to leif:
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
sage: time prime_pi(9.75*10**15) 272450165623660 Time: CPU 25614.37 s, Wall: 25614.47 s
The correct value is 272450167482953
(http://www.ieeta.pt/~tos/primes.html). I also computed prime_pi(10**16)
but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).
comment:17 in reply to: ↑ 16 Changed 8 years ago by
Replying to ohanar:
Replying to leif:
Andrew, do you have some examples of wrong prime counts (beyond 249), and if so, on what kind of systems does one experience them?
sage: time prime_pi(9.75*10**15) 272450165623660 Time: CPU 25614.37 s, Wall: 25614.47 s
The correct value is
272450167482953
(http://www.ieeta.pt/~tos/primes.html). I also computedprime_pi(10**16)
but in a moment of confusion, I killed the process before I grabbed the value. If you want it I can start running it again with my current version of the code (~3-5% faster than the latest patch).
Never mind. I thought you had collected a bunch of false results during development. If (apparently) all platforms are affected, it should be easier to track this down - rather than flaws due to e.g. Apple "specifics"... ;-)
Btw, I have a large collection of correct prime counts here, so if eventually the sysloads drop, I'll by myself check some results against your new implementation.
Anyway, anybody feel free to compute more values above our current limit and post them here, whether correct or not. :-)
comment:18 Changed 8 years ago by
- Description modified (diff)
comment:19 Changed 8 years ago by
An update:
v4: added a table of prime_pi
values that can be stored in a byte -- using this with the __cached_primes
method gave a giant speedup. Since the sum at the beginning of __phi
(and __phi32
) differs from 16*(x/77)
by a small factor that only depends on x%2310
, so adding a table with these factors results in another large speedup. These tables take a total of ~4kB.
v5: the tables introduced a significant initialization time for prime_pi
, and William asked me to fix this, so prime_pi
(and legendres_formula) only initialize these tables on an initial call -- so initialization time is now ~650ns:
sage: timeit('sage.functions.prime_pi.PrimePi()') 625 loops, best of 3: 627 ns per loop
Both Yann and I have tested 9.75*10^15
again and got the correct result, I also did 10^16
. We both ran it on 64-bit GNU/Linux systems. So the issue I was encountering may have been due to boundary issues with the __cached_primes
method (which have been rewritten over the course of the patches). I kept the limit in since I haven't yet tested this thoroughly.
comment:20 Changed 8 years ago by
- Description modified (diff)
comment:21 follow-up: ↓ 22 Changed 8 years ago by
I've so far only tested a little (on 64-bit Linux; dead old 32-bit box still building Sage), but already found one wrong result with patch v3:
502000000000000 15296923904889 # 11475v3 502000000000000 15296922045596 # correct result, (at least) double-checked
Note that 502*1012 is below 249 (562.949.953.421.312).
As my machines are still busy, you could perhaps check if the newer version(s) give the correct result.
comment:22 in reply to: ↑ 21 ; follow-up: ↓ 23 Changed 8 years ago by
Works on sage.math with 11475v5:
sage: time prime_pi(502000000000000) 15296922045596 Time: CPU 723.40 s, Wall: 723.43 s
comment:23 in reply to: ↑ 22 Changed 8 years ago by
Replying to ohanar:
Works on sage.math with 11475v5:
sage: time prime_pi(502000000000000) 15296922045596 Time: CPU 723.40 s, Wall: 723.43 s
Yep, here too; with both v4 and v5 (64-bit Linux).
comment:24 follow-up: ↓ 25 Changed 8 years ago by
580000000000000 17596211452162 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 581000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 582000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 583000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 584000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 585000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 586000000000000 0
Hmm, really?
comment:25 in reply to: ↑ 24 Changed 8 years ago by
Replying to leif:
580000000000000 17596211452162 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 581000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 582000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 583000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 584000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 585000000000000 0 ^CException KeyboardInterrupt in 'sage.functions.prime_pi.PrimePi.__pi' ignored 586000000000000 0Hmm, really?
Oops, thankfully easily fixed. Shouldn't occur in the next patch (whenever that is).
comment:26 follow-up: ↓ 27 Changed 8 years ago by
- Status changed from needs_review to needs_work
Ok, here are my first 32-bit tests (with patch v5):
sage -t "devel/sage-11475v5/sage/functions/prime_pi.pyx" ********************************************************************** File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126: sage: prime_pi(10**10) Expected: 455052511 Got: 498132213 ********************************************************************** 1 items had failures: 1 of 14 in __main__.example_1 ***Test Failed*** 1 failures. For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py [6.1 s] ---------------------------------------------------------------------- The following tests failed: sage -t "devel/sage-11475v5/sage/functions/prime_pi.pyx" Total time for all tests: 6.1 seconds real 0m6.293s user 0m5.328s sys 0m0.760s
With long tests included, some unhandled overflows occur:
sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx" Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored Exception OverflowError: 'long int too large to convert to int' in 'sage.functions.prime_pi.PrimePi.__sieve' ignored ********************************************************************** File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 126: sage: prime_pi(10**10) Expected: 455052511 Got: 498132213 ********************************************************************** File "/home/leif/Sage/sage-4.7/devel/sage-11475v5/sage/functions/prime_pi.pyx", line 231: sage: for n in (42,41..32): prime_pi(2**n) # long time (21s on mod.math, 2011) Expected: 156661034233 80316571436 41203088796 21151907950 10866266172 5586502348 2874398515 1480206279 762939111 393615806 203280221 Got: 2072485352 555689055 4249350341 1916569322 829105263 2484680881 3345496393 1649125207 813719441 403824770 203621732 ********************************************************************** 2 items had failures: 1 of 14 in __main__.example_1 1 of 14 in __main__.example_4 ***Test Failed*** 2 failures. For whitespace errors, see the file /home/leif/.sage//tmp/.doctest_prime_pi.py [52.1 s] ---------------------------------------------------------------------- The following tests failed: sage -t -long "devel/sage-11475v5/sage/functions/prime_pi.pyx" Total time for all tests: 52.2 seconds real 0m52.318s user 0m49.923s sys 0m0.956s
This is on Ubuntu 7.10 :) (P4 Northwood, gcc 4.2.1) with Sage 4.7.
comment:27 in reply to: ↑ 26 ; follow-up: ↓ 28 Changed 8 years ago by
Unfortunate, but kind of expected. It'll probably take a day or two for me to set up a 32-bit vm to try to debug the issue (it isn't blatantly obvious to me what is going wrong).
comment:28 in reply to: ↑ 27 Changed 8 years ago by
Replying to ohanar:
(it isn't blatantly obvious to me what is going wrong).
Yep, especially the "long int doesn't fit into int" reminds me of 16-bit systems... 8/
comment:29 Changed 8 years ago by
- Description modified (diff)
- Status changed from needs_work to needs_review
comment:30 follow-up: ↓ 31 Changed 8 years ago by
Hopefully that was the only issue with 32-bit systems. Basically gmp wants a 32-bit int for mpz_set_ui on 32-bit systems, so I gave it what it wanted (this was also causing the issue with the __sieve
function, since I use the mpz_sqrt
function to calculate the argument).
I am currently doing some more thorough testing with large values on a 64-bit machine, so that we can be sure there isn't any more breakage on the stable platform.
comment:31 in reply to: ↑ 30 Changed 8 years ago by
Replying to ohanar:
Hopefully that was the only issue with 32-bit systems.
v6 passes long tests on the 32-bit Ubuntu 7.10 machine (in about a minute). More to come.
comment:32 in reply to: ↑ 13 Changed 8 years ago by
Apply only trac_11475v6.patch
comment:33 follow-up: ↓ 35 Changed 8 years ago by
- Status changed from needs_review to needs_work
I have verified that prime_pi
gives correct output for i*10**12
with 0 <= i <= 1000
on x86_64 :). I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.
Unfortunately, I have also come across an issue with solaris 10 sparc (t2.math). The root of the cause is that mpz_export
has the bit ordering arguments flipped. I am not aware of another way to get unsigned ints out of gmp integers, nor am I aware of a way to check the platform from cython. Also, I don't know if other platforms have this same issue (solaris86 or mac powerpc in particular), so if someone could test that, please let me know.
comment:34 Changed 8 years ago by
- Status changed from needs_work to needs_review
Apply only trac_11475v8.patch
comment:35 in reply to: ↑ 33 ; follow-ups: ↓ 36 ↓ 37 Changed 8 years ago by
Replying to ohanar:
I have verified that
prime_pi
gives correct output fori*10**12
with0 <= i <= 1000
on x86_64 :).
Besides a couple of other values on individual machines, I've successfully computed pi(2k) for k=0...53 and pi(10k) for k=0...16 on each of three different machines (two of them 32-bit; all with patch version 6).
Some computations are still in progress, but so far I could also verify the results of prime_pi(2**54)
and prime_pi(2**55)
(on at least one machine). I've also played a little with "legendres_formula()
".
I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.
Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...
I wonder if legendres_formula()
shouldn't be renamed to legendre_phi()
- better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, and legendre_phi
would be analogous to e.g. euler_phi
in Sage. (I know Wolfram and others call "it" - i.e., IMHO the method computing the respective function rather than the function itself - Legendre's Formula as well, but it's not that unambiguous.)
I'd also mention in its docstring it's called the partial sieve function.
comment:36 in reply to: ↑ 35 Changed 8 years ago by
Replying to leif:
Replying to ohanar:
I am probably going to remove the bound flag and instead put a warning about larger values not being well tested if/when they are called.
Yes, I also think we can or should drop any "artificial" limit now, though I must admit I haven't really looked at the code yet...
Ah, I see, you already removed the restriction in patch v8 (and also added warnings w.r.t. seemingly "endless" computations, which I also wanted to suggest).
There's a small typo in __call__()
's docstring (the 2**32
should read 2**64
):
This implementation uses unsigned 64-bit ints and thus does not support `x >= 2**32`:: sage: prime_pi(2^64) Traceback (most recent call last): ... NotImplementedError: computation of prime_pi for x >= 2^64 is not implemented
Also, there's a mixture of both kinds of exponentiation operators (^
and **
) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.
comment:37 in reply to: ↑ 35 ; follow-up: ↓ 38 Changed 8 years ago by
Replying to leif:
Replying to ohanar:
I have verified that
prime_pi
gives correct output fori*10**12
with0 <= i <= 1000
on x86_64 :).Besides a couple of other values on individual machines,
Which machines have you had issues with? Everything I have tested with v8 has worked (this includes x86 and x86_64 linux, 64bit Snow Leopard, and Sparc Solaris 10).
I wonder if
legendres_formula()
shouldn't be renamed tolegendre_phi()
- better names appreciated, but the latter ( \phi(x,a) ) seems to be quite established in literature, andlegendre_phi
would be analogous to e.g.euler_phi
in Sage.
Makes sense to me, if no one has an objection to the idea, I'll make the change.
I'd also mention in its docstring it's called the partial sieve function.
Sounds good to me.
Replying to leif:
Also, there's a mixture of both kinds of exponentiation operators (
^
and**
) throughout the [doc]strings; I'm not sure if we shouldn't use just one notation.
Will do -- probably ^
since it is more consumer friendly.
comment:38 in reply to: ↑ 37 Changed 8 years ago by
Replying to ohanar:
Replying to leif:
Replying to ohanar:
I have verified that
prime_pi
gives correct output fori*10**12
with0 <= i <= 1000
on x86_64 :).Besides a couple of other values on individual machines,
Which machines have you had issues with?
Oh, I didn't get any wrong results; I just meant I've also tested prime_pi(x)
with some "random" values (besides the mentioned powers of 2 and ten), but not systematically on all machines (all Linux boxes; Pentium4 Northwood, P4 Prescott and Core2).
comment:39 Changed 8 years ago by
- Description modified (diff)
Computation of prime_pi(2**56)
just finished with the correct result on one of the 32-bit machines :-) (took 2 days+).
[Updated description w.r.t. (now obsolete) limitation / potentially wrong results for inputs exceeding 249 with earlier versions.]
comment:40 Changed 8 years ago by
So through the updates, I haven't been keeping legendres_formula
up to date. I am posting a patch that should fix the bugs that were present for the past few versions (primarily, calling it with a relatively small x
in comparison to a
would give an extremely wrong answer due to overflow issues). I have also changed the name, so it now is called either legendre_phi
or partial_sieve_function
.
The patch also makes all documentation use ^
, there are no longer any instances of **
.
The patch also touches up on __phi
and __phi32
-- cleaning up code, adding a few comments, and makes a simple optimization to reduce boolean checks for small x
values (~10% speedup for large input).
I'm hoping this will be the final patch for this ticket, so that I don't go into the double digits for version numbers :-).
Apply only trac_11475v9.patch
comment:41 Changed 8 years ago by
- Description modified (diff)
The new version comes in the right moment as the slow 32-bit machine just finished computing prime_pi(2^57)
(successfully; still with v6 though). No separate timings, but pi(254) ... pi(257) took 7.5 days in total. :-)
At first glance, there are only a few typos:
- There's a superfluous
+
in the computation ofself.__tabS[i]
(continuation line). - The 249 remains renitent:
cdef uint64_t __pi(self, uint64_t x) except -1: r""" Returns :math:`\pi(x)` and assumes :math:`self.__maxSieve < x < 2^{49}` """
s/
number of primes/
number of positive integers/
:cdef uint64_t __phi(self, uint64_t x, uint64_t i): r""" Legendre's formula: returns the number of primes :math:`\leq` ``x`` that are not divisible by the first ``i`` primes """
comment:42 Changed 8 years ago by
- Reviewers set to Yann Laigle-Chapuy, Leif Leonhardy
So far passed all [long] tests on Ubuntu 7.10 x86 (Sage 4.7) and Ubuntu 9.04 x86_64 (Sage 4.7.1.alpha3; patch to module_list.py
applies with 1 line offset).
That's of course a bit funny (I don't mind):
sage: legendre_phi(2^63,0) computation of legendre_phi for large x can take minutes, hours, or days depending on the size of x computation of legendre_phi for x >= 10^15 has not been thoroughly tested, be cautious of the result 9223372036854775808 sage:
We may also add "or weeks", but the current version is faster than the previous ones and much faster than the current version in Sage of course (not to mention its odd limitation).
There are opportunities for further improvements (e.g. copying the whole list of primes when we need more; perhaps some more comments / documentation), but these can IMHO be left to follow-up tickets.
In principle positive review from me (minus the typos), just hesitating to give it at this point without having studied the code more thoroughly.
comment:43 follow-ups: ↓ 44 ↓ 46 Changed 8 years ago by
Another corner case:
sage: legendre_phi(1000,10^10) --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) /home/leif/Sage/sage-4.7.1.alpha3/devel/sage-11475v9/<ipython console> in <module>() /home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5888)() /home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5517)() NotImplementedError: computing legendre_phi for a > prime_pi(2^32-1) not implemented sage:
(The message is a bit misleading by the way, since we limit x
to 64 bits, and therefore the result is independent of a >= 2^32
. If at all, I'd also give the numerical upper limit for a
, i.e. 203280221. But as I said, there isn't really any limit on a
.)
Also, don't try this at home:
sage: time legendre_phi(1000,10^8)
(I fortunately could kill the process before the almost freezed machine went out of swap space. Apparently computing the list of primes is uninterruptable, which worsens the situation.)
comment:44 in reply to: ↑ 43 Changed 8 years ago by
Replying to leif:
The message is a bit misleading by the way, since we limit
x
to 64 bits, and therefore the result is independent ofa >= 2^32
.
More precisely: independent of a >= prime_pi(2^32)
. ;-)
comment:45 Changed 8 years ago by
I'd also use int_fast8_t
and uint_fast8_t
instead of int8_t
and uint8_t
, since size doesn't really matter there:
-
sage/functions/prime_pi.pyx
diff --git a/sage/functions/prime_pi.pyx b/sage/functions/prime_pi.pyx
a b 32 32 include '../ext/stdsage.pxi' 33 33 include '../ext/interrupt.pxi' 34 34 35 from libc.stdint cimport int 8_t, uint8_t, uint32_t, uint64_t35 from libc.stdint cimport int_fast8_t, uint_fast8_t, uint32_t, uint64_t 36 36 from sage.rings.integer cimport Integer 37 37 from sage.rings.real_mpfr import RealLiteral, create_RealNumber, RR 38 38 from sage.rings.arith import nth_prime … … 134 134 cdef uint32_t __numPrimes 135 135 cdef uint32_t __maxSieve 136 136 cdef uint64_t __maxSieve64 137 cdef int 8_t[2310] __tabS138 cdef uint 8_t[1619] __smallPi137 cdef int_fast8_t[2310] __tabS 138 cdef uint_fast8_t[1619] __smallPi 139 139 140 140 def __dealloc__(self): 141 141 if self.__numPrimes:
comment:46 in reply to: ↑ 43 ; follow-up: ↓ 47 Changed 8 years ago by
Replying to leif:
Another corner case:
sage: legendre_phi(1000,10^10) --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) /home/leif/Sage/sage-4.7.1.alpha3/devel/sage-11475v9/<ipython console> in <module>() /home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5888)() /home/leif/Sage/sage-4.7.1.alpha3/local/lib/python2.6/site-packages/sage/functions/prime_pi.so in sage.functions.prime_pi.legendre_phi (sage/functions/prime_pi.c:5517)() NotImplementedError: computing legendre_phi for a > prime_pi(2^32-1) not implemented sage:(The message is a bit misleading by the way, since we limit
x
to 64 bits, and therefore the result is independent ofa >= 2^32
. If at all, I'd also give the numerical upper limit fora
, i.e. 203280221. But as I said, there isn't really any limit ona
.)
Of course, I wrote the method as a wrapper for phi, so I was mainly thinking of what I needed to call that method. I can easily remove this bound.
Also, don't try this at home:
sage: time legendre_phi(1000,10^8)(I fortunately could kill the process before the almost freezed machine went out of swap space. Apparently computing the list of primes is uninterruptable, which worsens the situation.)
When fixing the above issue, I can fix this. This same issue will appear for very large x
for prime_pi
, this can be blamed on PARI -- it stores the primes inefficiently and will use ~17GB for a sieve up to 2**32-1
(my code should use ~1GB at this size). I can implement a custom sieve if it this poses much of an issue (eventually though something needs to be done to fix prime_range
).
comment:47 in reply to: ↑ 46 ; follow-ups: ↓ 48 ↓ 49 Changed 8 years ago by
Replying to ohanar:
Replying to leif:
(I fortunately could kill the process before the almost freezed machine went out of swap space. Apparently computing the list of primes is uninterruptable, which worsens the situation.)
When fixing the above issue, I can fix this. This same issue will appear for very large
x
forprime_pi
, this can be blamed on PARI -- it stores the primes inefficiently and will use ~17GB for a sieve up to2**32-1
.
Ouch. (Or should it read 1.7 GB, with ~8 bytes per prime? Otherwise there would be a real hardware limit - max. 2 or 4 GB address space - on 32-bit machines and OSs I haven't yet met during the tests.)
I thought the huge memory consumption was mostly the fault of prime_range()
(and Python), and to some extent of having up to three or four lists or arrays of primes at the same time (PARI + Python + perhaps two Cython arrays while copying).
Would using Python ints (py_ints=True
parameter to prime_range()
) alleviate the situation in any way?
One could also request the primes in chunks rather than the whole interval at once, or implement some generator class or function that calls PARI.
Using sage_realloc()
would also reduce the (peak) physical memory requirement.
I can implement a custom sieve if this poses much of an issue (eventually though something needs to be done to fix
prime_range
).
Hmmm. I thought we could avoid using prime_range()
later, on a follow-up, e.g. by directly accessing PARI from Cython. But feel free to implement or use some other sieve / function.
At least potentially freezing the machine (which IMHO is an OS problem) isn't very user-friendly, especially in educational software I think...
comment:48 in reply to: ↑ 47 Changed 8 years ago by
Replying to leif:
Replying to ohanar:
This same issue will appear for very large
x
forprime_pi
, this can be blamed on PARI -- it stores the primes inefficiently and will use ~17GB for a sieve up to2**32-1
.
That might have been the case with older PARI versions; if I do
sage: time pari.init_primes(2^32) Time: CPU 13.42 s, Wall: 18.47 s
the whole process just occupies about 283 MB (89 MB thereof basic Sage). And yes, all have been computed, as one can "verify" with e.g. the time it takes to get some primes around the current "end":
sage: prime_range(2^32,2^32+100) # immediate answer if init_primes(2^32) has been called in advance [4294967311, 4294967357, 4294967371, 4294967377, 4294967387, 4294967389]
The fatal things are commands like this:
sage: prime_range(2^30)[-1]
which turns the relative prime offsets stored by PARI (hardly more than one byte per prime on average) into a list of boxed integers.
comment:49 in reply to: ↑ 47 Changed 8 years ago by
Replying to leif:
Using
sage_realloc()
would also reduce the (peak) physical memory requirement.
How long has sage had sage_realloc
? I could have sworn that it didn't exist a year ago.
I can implement a custom sieve if this poses much of an issue (eventually though something needs to be done to fix
prime_range
).Hmmm. I thought we could avoid using
prime_range()
later, on a follow-up, e.g. by directly accessing PARI from Cython. But feel free to implement or use some other sieve / function.
I have am working on v10, which will primarily change the __sieve
function to directly access PARI and use sage_realloc
. I have this working on 64-bit linux right now:
sage: get_memory_usage() 844.1484375 sage: time legendre_phi(1000,10^8) 1 Time: CPU 22.55 s, Wall: 22.54 s sage: get_memory_usage() 1406.78515625
Obviously I haven't made all the changes that I need to with legendre_phi
, however it removes the potentially insane memory issues with using prime_range
. Worst case scenario for storage now:
sage: time legendre_phi(1000, 203280221) 1 Time: CPU 46.85 s, Wall: 46.86 s sage: get_memory_usage() 1975.58203125
comment:50 Changed 8 years ago by
Just for the record:
The second 32-bit machine just successfully computed prime_pi(2^58)
(still patch version 6), which took about ten days on an already fully loaded system.
Changed 8 years ago by
rewrote sieve (now goes directly through pari) + some significant changes to legendre_phi
comment:51 Changed 8 years ago by
Been busy, but I finally managed to get another patch (after running into quite a number of issues). This has some pretty significant changes -- specifically to legendre_phi
and __sieve
.
legendre_phi
now has keyboard interrupt support :). Also, it removes the limit to a
as well as make some other improvements to speed for various special cases.
__sieve
was rewritten to remove the use of prime_range
and now works directly through PARI. Also sage_realloc
was added (I wasn't aware that sage had realloc
, I could have sworn that I tried using it before with no luck).
Apply only trac_11475v10.patch
comment:52 Changed 8 years ago by
- Description modified (diff)
Changed 8 years ago by
Sage library patch. Apply on top of Andrew's v10. (Actually also based on Sage 4.7.1.alpha4.)
comment:53 Changed 8 years ago by
- Description modified (diff)
I've added a reviewer patch to Andrew's version 10:
- PARI's
diffptr
isuint8_t*
(actuallyunsigned char*
orbyteptr
); usinguint_fast8_t*
there would fail on any machine on whichsizeof(uint8_t) != sizeof(uint_fast8_t)
. - Use the more portable
ptrdiff_t
for "__diffptrOff
" (which will then also be smaller on 32-bit machines). - Change order of libraries in
module_list.py
, as PARI also depends on GMP, to support dumber linkers (necessary for at least Cygwin I guess). stdint.h
is C99, so add the corresponding compiler flag.
The superfluous +
is still in, missed to fix that in my reviewer patch.
I haven't yet really looked at the rest of the new code (i.e., v10), only performed the standard tests (-long
) on both Ubuntu 9.04 x86 and x86_64 (Sage 4.7 and Sage 4.7.1.alpha4, respectively), which passed (seemingly slightly faster, but I may be wrong ;-) ).
The module_list.py
part of Andrew's v10 patch applies to Sage 4.7.1.alpha4, too, but with an offset of -26 lines, so should perhaps be rebased.
(Unfortunately my second 32-bit machine apparently died, on which I mainly tested prime_pi()
for larger values previously.)
comment:54 Changed 8 years ago by
- Description modified (diff)
So how the prime list is created and managed was bothering me. Specifically, its initiation code uses a bit of a hack to figure out whether or not it was called by init_tables and also uses pari(n).primepi()
, which without #11741 can be quite buggy for larger n; it can consume a rather large amount of memory if the user gives a large input; and the user does not have any option on how big it is.
So I've completely rewritten the initiation method, it now recursively calls the pi method to determine the length of the prime list. I've also completely I've removed the caching part of the code, since we can quickly copy the primes over from pari's list, and instead added an optional second argument that allows the user to use additional memory if so desired. I've also tweaked the size of the smallPi list to give another significant speedup (roughly 50% faster than the last patch), as well as made a couple of simplifications in phi32.
Beyond those changes, I've refactored a lot of code, replaced cython code with python for the non-critical sections in order to improve readability, reverted the plot method to something very close to its original state (since we are no longer caching), and folded in Leif's reviewer patch. The code is now 552 lines versus the previous 625.
comment:55 Changed 8 years ago by
Apply trac_11475mk2.patch
(for patchbot)
comment:56 Changed 8 years ago by
- Status changed from needs_review to needs_work
According to the patchbot logs there are some issues with documentation formatting:
docstring of sage.functions.prime_pi.legendre_phi:39: WARNING: Duplicate explicit target name: "rao2011". docstring of sage.functions.prime_pi.partial_sieve_function:39: WARNING: Duplicate explicit target name: "rao2011". docstring of sage.functions.prime_pi.legendre_phi:39: WARNING: duplicate citation RAO2011, other instance in /storage/masiao/sage-5.0.beta8/devel/sage/doc/en/reference/sage/functions/prime_pi.rst docstring of sage.functions.prime_pi.partial_sieve_function:39: WARNING: duplicate citation RAO2011, other instance in /storage/masiao/sage-5.0.beta8/devel/sage/doc/en/reference/sage/functions/prime_pi.rst
This is basically because there should be only one copy of the bibliography entry in the file, rather than having it afresh in every docstring that cites it.
Also, shouldn't the class PrimePi derive from sage.symbolic.function.BuiltinFunction?, rather from the base Object class?
comment:57 Changed 8 years ago by
- Status changed from needs_work to needs_review
Ok, both issues should be resolved -- it should now work properly in the symbolic ring.
comment:58 Changed 8 years ago by
Apply trac_11475mk2.patch
comment:59 Changed 8 years ago by
- Status changed from needs_review to positive_review
works good to me.
comment:60 Changed 8 years ago by
On the plus side, this fixes some of #6876. On the down side:
WARNING: we draw the plot of ``prime_pi`` as a stairstep function with explicitly drawn vertical lines where the function jumps. Technically Technically there should not be any vertical lines, but they make the graph look much better, so we include them. Use the option ``vertical_lines=False`` to turn these off.
I'm adding a fix for the latter, and a doctest for the former. Shouldn't affect anything.
comment:61 Changed 8 years ago by
- Keywords sd40.5 added
- Reviewers changed from Yann Laigle-Chapuy, Leif Leonhardy to Yann Laigle-Chapuy, Leif Leonhardy, William Stein, Karl-Dieter Crisman
comment:62 Changed 8 years ago by
- Description modified (diff)
Patchbot: Apply trac_11475mk2.patch and trac_11475-reviewer.patch.
Changed 8 years ago by
comment:63 Changed 8 years ago by
William pointed out that this didn't fix #6876 quite how we want, so just fixing the typo.
comment:64 Changed 7 years ago by
- Status changed from positive_review to needs_work
This fails on arando (Linux Ubuntu 12.04 i686, 32 bit):
sage -t --long -force_lib devel/sage/sage/functions/prime_pi.pyx ********************************************************************** File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.1.beta4/devel/sage-main/sage/functions/prime_pi.pyx", line 123: sage: prime_pi(10^10) Expected: 455052511 Got: 70462980 ********************************************************************** File "/var/lib/buildbot/build/sage/arando-1/arando_full/build/sage-5.1.beta4/devel/sage-main/sage/functions/prime_pi.pyx", line 231: sage: for i in (32..42): prime_pi(2^i) # long time (13s on sage.math, 2011) Expected: 203280221 393615806 762939111 1480206279 2874398515 5586502348 10866266172 21151907950 41203088796 80316571436 156661034233 Got: 0 0 0 0 0 0 0 0 0 0 0 **********************************************************************
comment:65 Changed 7 years ago by
On hawk (OpenSolaris i386), it actually Segmentation Faults.
comment:66 Changed 7 years ago by
- Status changed from needs_work to needs_review
comment:67 Changed 7 years ago by
patchbot: apply trac_11475mk2.patch trac_11475-reviewer.patch
comment:68 Changed 7 years ago by
patchbot: apply trac_11475mk2.patch trac_11475-reviewer.patch
comment:69 Changed 7 years ago by
- Status changed from needs_review to positive_review
Jeroen: now it passes on arando. Can you check on hawk?
comment:70 Changed 7 years ago by
- Milestone changed from sage-5.1 to sage-5.2
comment:71 Changed 7 years ago by
- Merged in set to sage-5.2.beta1
- Resolution set to fixed
- Status changed from positive_review to closed
comment:72 Changed 7 years ago by
Why does this ticket make random whitespace changes to sage/functions/all.py
? This is total BS. This just breaks patches on trac and wastes other people's time. For example
#11143 doesn't apply any more.
comment:73 Changed 7 years ago by
- Merged in sage-5.2.beta1 deleted
- Milestone changed from sage-5.2 to sage-5.3
- Resolution fixed deleted
- Status changed from closed to new
On bsd.math (OS X 10.6 x86_64), I still get
sage -t --long -force_lib devel/sage/sage/modular/abvar/torsion_subgroup.py ********************************************************************** File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/devel/sage-main/sage/modular/abvar/torsion_subgroup.py", line 316: sage: G.multiple_of_order() Exception raised: Traceback (most recent call last): File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/bin/ncadoctest.py", line 1231, in run_one_test self.run_one_example(test, example, filename, compileflags) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/bin/sagedoctest.py", line 38, in run_one_example OrigDocTestRunner.run_one_example(self, test, example, filename, compileflags) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/bin/ncadoctest.py", line 1172, in run_one_example compileflags, 1) in test.globs File "<doctest __main__.example_9[7]>", line 1, in <module> G.multiple_of_order()###line 316: sage: G.multiple_of_order() File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/abvar/torsion_subgroup.py", line 375, in multiple_of_order f = A.hecke_polynomial(p) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/abvar/abvar.py", line 2239, in hecke_polynomial f = self._compute_hecke_polynomial(n, var=var) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/abvar/abvar.py", line 3800, in _compute_hecke_polynomial return sqrt_poly(self.modular_symbols().hecke_polynomial(n, var)) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/module.py", line 1460, in hecke_polynomial return self.hecke_operator(n).charpoly(var) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/hecke_operator.py", line 274, in charpoly return self.matrix().charpoly(var) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/hecke_operator.py", line 747, in matrix self.__matrix = self.parent().hecke_matrix(self.__n, *args, **kwds) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/algebra.py", line 589, in hecke_matrix return self.__M.hecke_matrix(n, *args, **kwds) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/module.py", line 1351, in hecke_matrix T = self._compute_hecke_matrix(n) File "/Users/buildbot/build/sage/bsd-1/bsd_64_full/build/sage-5.2.beta1/local/lib/python/site-packages/sage/modular/hecke/submodule.py", line 239, in _compute_hecke_matrix return A.restrict(self.free_module(), check=check) File "matrix2.pyx", line 4202, in sage.matrix.matrix2.Matrix.restrict (sage/matrix/matrix2.c:23293) File "element.pyx", line 2736, in sage.structure.element.Matrix.__mul__ (sage/structure/element.c:19170) File "coerce.pyx", line 740, in sage.structure.coerce.CoercionModel_cache_maps.bin_op (sage/structure/coerce.c:6778) File "action.pyx", line 147, in sage.matrix.action.MatrixMatrixAction._call_ (sage/matrix/action.c:3098) File "matrix_rational_dense.pyx", line 1047, in sage.matrix.matrix_rational_dense.Matrix_rational_dense._matrix_times_matrix_ (sage/matrix/matrix_rational_dense.c:12115) File "matrix_rational_dense.pyx", line 1090, in sage.matrix.matrix_rational_dense.Matrix_rational_dense._multiply_over_integers (sage/matrix/matrix_rational_dense.c:12248) RuntimeError: Segmentation fault **********************************************************************
comment:74 follow-up: ↓ 75 Changed 7 years ago by
Do either of these cause a segfault?:
sage: prime_pi(2**15) 3512 sage: prime_pi(2**10-1) 172
Those are the only two calls made to prime_pi
made by that doctest.
comment:75 in reply to: ↑ 74 Changed 7 years ago by
Replying to ohanar:
Do either of these cause a segfault?:
sage: prime_pi(2**15) 3512 sage: prime_pi(2**10-1) 172
No.
comment:76 Changed 7 years ago by
- Status changed from new to needs_review
Ok, OSX issue should now be fixed (all tests pass for me on bsd.math now).
comment:77 Changed 6 years ago by
- Milestone changed from sage-5.11 to sage-5.12
comment:78 Changed 6 years ago by
- Branch set to u/ohanar/prime_pi
- Dependencies set to #15138
this patch has bit-rotted, I'll upload a git branch in a moment
comment:79 Changed 6 years ago by
- Commit set to e3e33e62cc19bf93b5236e7d1177482bd52b8a9e
Branch pushed to git repo; I updated commit sha1. New commits:
[changeset:e3e33e6] | Trac 11475 reviewer patch - very minor prime_pi plot doc improvement |
[changeset:a3a0edd] | fix doctest |
[changeset:ef57887] | remove trailing \'s and other little cleanup |
[changeset:ed701d3] | prime_pi: fix off by 1 mallocing issue |
[changeset:bc36b43] | 5x speedup to prime_pi |
comment:80 Changed 6 years ago by
- Milestone changed from sage-6.1 to sage-6.2
comment:81 Changed 6 years ago by
- Milestone changed from sage-6.2 to sage-6.3
comment:82 Changed 6 years ago by
- Status changed from needs_review to positive_review
The patchbot experienced some glitches on this ticket, so I tested it again; all tests pass and the documentation builds. From the comments, it appears that the ticket was already closed, then reopened because of a segfault on OS X 10.6 x86_64, and that this bug was subsequently fixed. Hence let's put it back to positive review.
comment:83 Changed 6 years ago by
- Branch changed from u/ohanar/prime_pi to e3e33e62cc19bf93b5236e7d1177482bd52b8a9e
- Resolution set to fixed
- Status changed from positive_review to closed
based on sage-4.7