Opened 6 years ago

Closed 6 years ago

#15443 closed defect (fixed)

Random time outs in ecm.py

Reported by: vbraun Owned by:
Priority: major Milestone: sage-6.2
Component: number theory Keywords:
Cc: was, robertwb, roed, jpflori Merged in:
Authors: Volker Braun Reviewers: François Bissey
Report Upstream: N/A Work issues:
Branch: 383520e (Commits) Commit: 383520e4ff6269b391b5e9fd474349b06d9de156
Dependencies: Stopgaps:

Description

Infrequently, I get this doctest failure:

sage -t --long src/sage/interfaces/ecm.py
    Timed out
**********************************************************************
Tests run before process (pid=8764) timed out:
sage: f = ECM() ## line 169 ##
sage: n = 508021860739623467191080372196682785441177798407961 ## line 170 ##
sage: f.one_curve(n, B1=10000, sigma=11) ## line 171 ##
[1, 508021860739623467191080372196682785441177798407961]
sage: f.one_curve(n, B1=10000, sigma=1022170541) ## line 173 ##
[79792266297612017, 6366805760909027985741435139224233]
sage: n = 432132887883903108009802143314445113500016816977037257 ## line 175 ##
sage: f.one_curve(n, B1=500000, algorithm="P-1") ## line 176 ##
[67872792749091946529, 6366805760909027985741435139224233]
sage: n = 2088352670731726262548647919416588631875815083 ## line 178 ##
sage: f.one_curve(n, B1=2000, algorithm="P+1", x0=5) ## line 179 ##
[328006342451, 6366805760909027985741435139224233]
sage: sig_on_count() ## line 181 ##
0
sage: f = ECM() ## line 241 ##
sage: n = 508021860739623467191080372196682785441177798407961 ## line 242 ##
sage: f.find_factor(n) ## line 243 ##
[79792266297612017, 6366805760909027985741435139224233]
sage: f=2^2^14+1 ## line 247 ##
sage: ecm.find_factor(f) ## line 248 ##
sage: sig_on_count() ## line 252 ##
0
sage: ecm.factor(602400691612422154516282778947806249229526581) ## line 333 ##
[45949729863572179, 13109994191499930367061460439]
sage: ecm.factor((2^197 + 1)/3)           # takes a long time ## line 336 ##

Attachments (5)

707.out (16.7 KB) - added by fbissey 6 years ago.
strace for ecm failure run 707
1676.out (16.7 KB) - added by fbissey 6 years ago.
strace for ecm run 1676
3822.out (16.7 KB) - added by fbissey 6 years ago.
strace for ecm failure run 3822
test_until_fail.py (706 bytes) - added by vbraun 6 years ago.
script for automatted testing with strace
strace.5385.bz2 (19.9 KB) - added by fbissey 6 years ago.
ecm strace for the failure

Download all attachments as: .zip

Change History (48)

comment:1 Changed 6 years ago by vbraun

ecm uses different seeds all the time, for example

echo 179424673 | /home/vbraun/Code/sage/local/bin/ecm -I 1 -c 1000000000 -cofdec -one 8.0

produces a different output every time.


Occasionally, ecm.py does a lot more factorizations and then eventually times out. Using strace, I see that, for example, the following is run when it eventually times out;

echo 251951573867253012259144010843 | /home/vbraun/Code/sage/local/bin/ecm -I 1 -c 1000000000 -cofdec -one 8.0

Note that the command is rather slow (requires hundreds of trial steps) and that this prime (251951573867253012259144010843) is usually not used as an ecm input. So occasionally (but unlikely) one of the previous computations must yield this number as candidate and then factoring it takes a really long time.

comment:2 Changed 6 years ago by vbraun

In fact, the problematic doctest is factoring (2^197 + 1)/3. Usually, this yields from ecm:

Factor found in step 2: 197002597249
Found probable prime factor of 12 digits: 197002597249
Composite cofactor 339872432034468861533158743041639097889948066859 has 48 digits

But randomly (and rather unlikely):

Factor found in step 2: 265748496095531068869578877937
Found composite factor of 30 digits: 265748496095531068869578877937
Probable prime cofactor 251951573867253012259144010843 has 30 digits

So this is the origin of the prime 251951573867253012259144010843, which is then subsequently factored (but apparently very slowly).

comment:3 Changed 6 years ago by vbraun

Fun find: run ecm find_factor() on a prime:

sage: f = ECM()
sage: f.find_factor(3)

OMG zombies everywhere!

comment:4 Changed 6 years ago by jdemeyer

Thanks for the analysis! Just a question: do you plan to write a patch (if not, I might).

comment:5 Changed 6 years ago by vbraun

In working on it!

comment:6 Changed 6 years ago by vbraun

  • Branch set to u/vbraun/ecm_cleanup

comment:7 Changed 6 years ago by git

  • Commit set to 998126dbd22d30eb34b46a0f122f9d62a9f861c9

Branch pushed to git repo; I updated commit sha1. New commits:

998126dactually fix the source of the random (but unlikely) doctest failures

comment:8 Changed 6 years ago by vbraun

  • Authors set to Volker Braun
  • Cc was robertwb added
  • Status changed from new to needs_review

I haven't found any other issue in repeated testing, ready for review!

comment:9 Changed 6 years ago by fbissey

So I got 6.0_beta4 built and I tested sage/interfaces/ecm.py a thousand time. The last one (number 1000) failed :(

Running doctests with ID 2013-12-11-12-59-50-efe5883e.
Doctesting 1 file.
sage -t --long local/lib/python2.7/site-packages/sage/interfaces/ecm.py
**********************************************************************
File "local/lib/python2.7/site-packages/sage/interfaces/ecm.py", line 606, in sage.interfaces.ecm.ECM.get_last_params
Failed example:
    ecm.factor((2^197 + 1)/3)             # long time
Expected:
    [197002597249, 1348959352853811313, 251951573867253012259144010843]
Got:
    [197002597249, 339872432034468861533158743041639097889948066859]
**********************************************************************
1 item had failures:
   1 of   3 in sage.interfaces.ecm.ECM.get_last_params
    [43 tests, 1 failure, 6.84 s]
----------------------------------------------------------------------
sage -t --long local/lib/python2.7/site-packages/sage/interfaces/ecm.py  # 1 doctest failed
----------------------------------------------------------------------
Total time for all tests: 6.9 seconds
    cpu time: 0.1 seconds
    cumulative wall time: 6.8 seconds

How uncool is that.

comment:10 Changed 6 years ago by vbraun

You don't happen to have a strace of the failed attempt?

comment:11 Changed 6 years ago by fbissey

No haven't thought of that. I am doing another run right now to see if I can reproduce it. But I will do that if I can reproduce it. The other thing that struck me is that the time to run the test is highly variable from 3 to 10 seconds and I have no idea why that would be.

comment:12 Changed 6 years ago by fbissey

Nothing on the second run. I have bumped the number to 2000 and I am recording straces in case I see something again.

comment:13 Changed 6 years ago by fbissey

Ok I am not sure what kind of fluck happened in my first run. It is still the only failure I have seen in 4000 runs of that test.

comment:14 Changed 6 years ago by vbraun

The timing difference is because ecm is random and without a fixed seed. Sometimes it just takes longer. And also depends on which of the three factors is found first.

I've run 20k iterations of ecm.factor((2^197 + 1)/3) without any failure so far...

comment:15 Changed 6 years ago by vbraun

Ok, the 22259-th iteration did fail ;-) Now trying again with strace...

Changed 6 years ago by fbissey

strace for ecm failure run 707

Changed 6 years ago by fbissey

strace for ecm run 1676

Changed 6 years ago by fbissey

strace for ecm failure run 3822

comment:16 Changed 6 years ago by fbissey

I started a 200000 sweep which stopped at 5040 (probably run out of inodes in the folder storing the straces) I have attached the straces for the 3 failures I got.

comment:17 Changed 6 years ago by vbraun

I need the strace of the ecm process, not of the main Sage python process... the latter doesn't tell us much.

comment:18 Changed 6 years ago by fbissey

OK I thought it was thin. How do I attach to the started ecm process? Any suggestion for doing it in an automated way? strace -f (-ff)?

Changed 6 years ago by vbraun

script for automatted testing with strace

comment:19 Changed 6 years ago by vbraun

Yes, or see the attached script.

comment:20 Changed 6 years ago by fbissey

I think I understand how the script works, one more turn, then.

comment:21 Changed 6 years ago by fbissey

OK I got a failure. Any clue to find which strace is the good one?

comment:22 Changed 6 years ago by vbraun

Grep for bin/ecm, for starters...

comment:23 Changed 6 years ago by fbissey

Better idea. The failure is

sage -t --long local/lib/python2.7/site-packages/sage/interfaces/ecm.py
**********************************************************************
File "local/lib/python2.7/site-packages/sage/interfaces/ecm.py", line 572, in sage.interfaces.ecm.ECM.factor
Failed example:
    ecm.factor(602400691612422154516282778947806249229526581)
Expected:
    [45949729863572179, 13109994191499930367061460439]
Got:
    [602400691612422154516282778947806249229526581]

Grepping for 602400691612422154516282778947806249229526581 does the trick nicely. Two hits one from sage and one from ecm.

Changed 6 years ago by fbissey

ecm strace for the failure

comment:24 Changed 6 years ago by vbraun

Reconstructed ECM output:

GMP-ECM 6.4.4 [configured with MPIR 2.6.0, --enable-asm-redc] [ECM]
Input number is 602400691612422154516282778947806249229526581 (45 digits)
Using B1=2000, B2=147396, polynomial x^1, sigma=4179203867
Step 1 took 2ms
Step 2 took 3ms
Run 2 out of 1000000000:
Using B1=2399, B2=2399-186156, polynomial x^1, sigma=4164584203
Step 1 took 2ms
Step 2 took 4ms
Run 3 out of 1000000000:
Using B1=2806, B2=2806-224406, polynomial x^1, sigma=3854196504
Step 1 took 3ms
Step 2 took 3ms
Run 4 out of 1000000000:
Using B1=3221, B2=3221-294786, polynomial x^1, sigma=3420982637
Step 1 took 3ms
Step 2 took 5ms
Run 5 out of 1000000000:
Using B1=3644, B2=3644-294786, polynomial x^1, sigma=1430191020

[...]

Using B1=22052, B2=22052-5026572, polynomial x^1, sigma=1562069045
Step 1 took 20ms
Step 2 took 26ms
Run 38 out of 1000000000:
Using B1=22779, B2=22779-5026572, polynomial x^1, sigma=1306733316
Step 1 took 21ms
Step 2 took 25ms
Run 39 out of 1000000000:
Using B1=23516, B2=23516-5031192, polynomial x^1, sigma=2409042976
Step 1 took 22ms
Step 2 took 25ms
Run 40 out of 1000000000:
Using B1=24263, B2=24263-5031192, polynomial x^1, sigma=518178309
Step 1 took 22ms
Step 2 took 26ms
********** Factor found in step 2: 602400691612422154516282778947806249229526581
Found input number N

So it seems that the "Found input number N" output doesn't imply that it is probably prime. Some documentation would have been nice, too. Though that leaves the question: how should we decide to give up? If the number is actually prime then it always ends in "Found input number N", so we can't just repeat until we find a factor. The readme says

5 - if no factor is found, either increase D by 5 digits and go to 0, or use 
    another factorization method (MPQS, NFS)

but doesn't give any probability estimate in that case. Also, one of the factors in question is only 17 digits so the B1-values tried should have been just fine.

comment:25 Changed 6 years ago by nbruin

Ah, OK. I see what happens. The case you are running into really has very low probability and it is surprising you're triggering it at all. You're basically finding BOTH factors together. The idea of ECM is to choose a group E over Z together with a group element G, such that for the integer N=p*q (it works for products of more primes too) such that the order of G mod p is B-smooth (allowing some non-smoothness with "second stage" large prime variation). However, if G mod q is also B-smooth then you're detecting both factors together.

In the analysis of ECM, it turns out the optimal parameter choice is such that the probability of hitting an (E,G) pair with B-smooth order of G mod p is pretty low. The probability of G mod q also being B-smooth should be pretty much independent. If p,q are roughly of the same size and you need on average N trials to find G mod p of B-smooth order, then you expect that in 1/N of those cases, G mod q also has B-smooth order. If q is much larger than p then this probability should be much lower. I'm surprised that for N=45949729863572179*13109994191499930367061460439 you guys are encountering this at all. It suggests that the B1 value used is a little on the large side if this happens with discernable frequency (what may happen is that you're simply unlucky the first few times and then you have increased B1 to such a point that finding both G mod p and G mod q to be B-smooth isn't such an unlikely event anymore. So perhaps you're increasing B1 a little too aggressively)

Anyway, as you can see, certainly for p,q of roughly the same size (not ECM's forte, but if p,q are not big, ECM can still be useful for those), finding p*q as a factor is really something that has to be counted on.

You should REALLY NOT run ECM on a number that might be prime. You should FIRST prove the number is composite for certain. In that case, you can treat finding the input as factor as finding no factor at all and simply try another time (perhaps NOT increasing B1)

comment:26 follow-up: Changed 6 years ago by vbraun

So for a poor physicist like me, how exactly should we go about testing (probabilistically, but with very low probability of failure comparable to the probability in ECM) that a number is not composite. Is that already implemented somewhere in Sage? Rabin-Miller with some particular list of witnesses?

comment:27 in reply to: ↑ 26 ; follow-up: Changed 6 years ago by nbruin

Replying to vbraun:

So for a poor physicist like me, how exactly should we go about testing (probabilistically, but with very low probability of failure comparable to the probability in ECM) that a number is not composite. Is that already implemented somewhere in Sage? Rabin-Miller with some particular list of witnesses?

I'd expect it to be available somewhere, yes, at the very least in PARI/GP. Baillie-PSW seems to be particularly bullet-proof (in the physics sense), with no known pseudo-primes. Finding a counterexample could raise some money for the sage foundation.

Implementation accessible in sage:

sage: pari(N).ispseudoprime()

(so, presently, no-one is aware of a particular composite value of N for which this returns true, although I suspect no-one would conjecture such values don't exist)

Last edited 6 years ago by nbruin (previous) (diff)

comment:28 in reply to: ↑ 27 Changed 6 years ago by jdemeyer

Replying to nbruin:

(so, presently, no-one is aware of a particular composite value of N for which this returns true, although I suspect no-one would conjecture such values don't exist)

According to the PARI guys, the standard conjecture is that there exist infinitely many counterexamples, even though no single one is known.

comment:29 Changed 6 years ago by vbraun

  • Cc roed added

comment:30 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:31 Changed 6 years ago by jpflori

  • Cc jpflori added

comment:32 Changed 6 years ago by rws

  • Status changed from needs_review to needs_work

comment:33 Changed 6 years ago by git

  • Commit changed from 998126dbd22d30eb34b46a0f122f9d62a9f861c9 to dbc4fec513ab7ad0c2c139e348f61d0851911920

Branch pushed to git repo; I updated commit sha1. New commits:

ba722d7Combine ECM-GMP with Baillie-PSW for prime factorization
ec8423cMergend in 6.2.beta1
dbc4fecintegrate better with Integer.is_prime

comment:34 Changed 6 years ago by vbraun

  • Status changed from needs_work to needs_review

I belive this branch now addresses all comments.

comment:35 Changed 6 years ago by fbissey

How do you address #comment:9 ?

comment:36 follow-up: Changed 6 years ago by vbraun

I'm now first using a strong primality test and only then run ecm forever. So unless you have a counterexample to Baillie-PSW you are guaranteed to get the prime decomposition.

comment:37 in reply to: ↑ 36 Changed 6 years ago by jpflori

Replying to vbraun:

I'm now first using a strong primality test and only then run ecm forever. So unless you have a counterexample to Baillie-PSW you are guaranteed to get the prime decomposition.

To be clearer, you mean you run the probable primality test on what ECM.factor() returns and only stop if all factors pass? So in the previous failing tests from comment:9, it will continue testing other curves when we were unlucky and ECM returned the composite input number?

Groumpf, or rather if it does not pass probable primality at first, you only stop when the number of returned factors is strictly greater than one? If you went for that, comment:23 will be solved, but not comment:9. So another question, is it still possible to call ECM without that additional probable primality test?

From my point of view, we should just accept the fact that ECM sometimes returns the full composite number (just as it will return factors which are composite), but document it and modify the doctests.

comment:38 follow-up: Changed 6 years ago by vbraun

Maybe you want to read the factor() method. First primality test, if that fails run ecm until it finds a factor. Repeat until you get a prime factorization.

There is already a method find_factor() that just runs the ECM output and may or may not return a factor. But that is a) annoying to doctest and b) not very user-friendly.

comment:39 in reply to: ↑ 38 Changed 6 years ago by jpflori

Replying to vbraun:

Maybe you want to read the factor() method. First primality test, if that fails run ecm until it finds a factor. Repeat until you get a prime factorization.

Sure, just had no time. It looks good and I 'll try to review that tonight if I catch some time.

There is already a method find_factor() that just runs the ECM output and may or may not return a factor. But that is a) annoying to doctest and b) not very user-friendly.

comment:40 Changed 6 years ago by fbissey

Now that I have read the patch better (anything is better than reading a patch early in the morning after being woken up by a crying 3 year old) it looks good. I don't really have the time to give it the trashing I gave the previous one right now but I assume you did :)

comment:41 Changed 6 years ago by vbraun

  • Reviewers set to François Bissey
  • Status changed from needs_review to positive_review

comment:42 Changed 6 years ago by git

  • Commit changed from dbc4fec513ab7ad0c2c139e348f61d0851911920 to 383520e4ff6269b391b5e9fd474349b06d9de156
  • Status changed from positive_review to needs_review

Branch pushed to git repo; I updated commit sha1 and set ticket back to needs_review. New commits:

383520efix doctest error

comment:43 Changed 6 years ago by vbraun

  • Branch changed from u/vbraun/ecm_cleanup to 383520e4ff6269b391b5e9fd474349b06d9de156
  • Resolution set to fixed
  • Status changed from needs_review to closed
Note: See TracTickets for help on using tickets.