Opened 15 years ago

Last modified 8 years ago

#1145 needs_work enhancement

high-level strategy for integer factorization

Reported by: Paul Zimmermann Owned by: aapitzsch
Priority: minor Milestone: sage-6.4
Component: factorization Keywords:
Cc: Jeroen Demeyer, Jean-Pierre Flori Merged in:
Authors: Reviewers:
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by aapitzsch)

I propose the following strategy for factor(integer):

  1. do trial division by all primes up to say 1000. This can be done efficiently by a single gcd with the product of all those primes.
  2. use GMP-ECM, starting from say B1=100, and increasing B1 by sqrt(B1) at each step, until one reaches the _recommended_B1_list value which corresponds to 1/3 of the size of the number to be factored. Thus for a 90-digit input, one will stop at B1=250000.
  3. try GMP-ECM P-1 and P+1 with respectively 9*B1 and 3*B1 where B1 is the last value tried for ECM. The corresponding cost of those runs will be approximately the same as the last ECM curve, thus this will not slow down the average computation, and might find a few factors.
  4. run MPQS or GNFS. You might want to issue a warning to the user (if called from toplevel) at that time.

Apply

  1. #5310
  2. #5945
  3. #10623
  4. trac_1145_integer_factorization.patch

Attachments (3)

evaluate_1145.sage (1.3 KB) - added by aapitzsch 12 years ago.
trac_1145_cunningham_warning.patch (1001 bytes) - added by aapitzsch 12 years ago.
trac_1145_integer_factorization.patch (27.5 KB) - added by aapitzsch 12 years ago.

Download all attachments as: .zip

Change History (41)

comment:1 Changed 15 years ago by Michael Abshoff

Milestone: sage-2.9

comment:2 Changed 15 years ago by William Stein

Description: modified (diff)

comment:3 Changed 15 years ago by William Stein

Comments from some experts:

On 12/11/2007, Robert Bradshaw <robertwb@math.washington.edu> wrote:
> I don't have much expertise in the area, but it looks like a sound
> proposal to me. The trial division bound seems a bit low (and perhaps
> should be adjusted for the size of the input). Is this similar to
> what Pari does? Would it make sense to parallelize ECM/QS by default
> on a multi-core system?
>
> - Robert
>
Bill Hart <goodwillhart@googlemail.com>
	
	
This proposal is absolutely correct. It is *exactly* what I would do,
with one exception. I would not issue a warning that MPQS is going to
start, unless the factorisation is over say 70 digits (~90s).

Bill.

comment:4 Changed 15 years ago by wbhart

I've just discussed this with Paul Zimmerman and here is what we should do (eventually):

1) Trial factoring for primes up to ~1000 2) SQUFOF for numbers up to ~32-40 bits (use the fast implementation in FLINT - it's 4x faster than anything else) 3) Pollard Rho if we have an efficient implementation 4) tinyQS in FLINT *if* the number of digits is small enough, i.e. if that is going to be faster than GMP-ECM (we have to profile this to determine the cutoff) 5) GMP-ECM with bound up to n(1/3) 6) p-1 and p+1 with large bound (it works rarely, but saves MPQS if it happens to work - the runtime compared to MPQS is *tiny*) 7) MPQS up to ~90 digits if large prime variant, ~100 digits if double large prime variant or ~130 digits if triple large prime variant QS 8) GNFS 9) You are the NSA, so you know what to do.

comment:5 Changed 15 years ago by Paul Zimmermann

Steps 2) and 3) of my original proposal might be easily implemented using the "finer ECM interface" proposed in #1421.

comment:6 Changed 15 years ago by Michael Abshoff

Milestone: sage-2.10sage-2.9.2

comment:7 Changed 13 years ago by David Loeffler

Component: number theoryfactorization
Owner: changed from William Stein to tbd

comment:8 Changed 13 years ago by Paul Zimmermann

Report Upstream: N/A
Resolution: wontfix
Status: newclosed

since no progress was done in 2 years, I close that ticket.

comment:9 Changed 13 years ago by Minh Van Nguyen

Milestone: sage-4.3.2sage-duplicate/invalid/wontfix

Make sure you understand the procedure for closing tickets. See this section of the Developer's Guide for more information.

comment:10 Changed 13 years ago by Robert Bradshaw

Milestone: sage-duplicate/invalid/wontfixsage-wishlist
Resolution: wontfix
Status: closednew

This is still an issue--I have code that I need to clean up and post here (though probably won't get around to it 'till this summer).

comment:11 Changed 13 years ago by Paul Zimmermann

Make sure you understand the procedure for closing tickets.

sorry, I wasn't aware of that.

comment:12 Changed 12 years ago by aapitzsch

Owner: changed from tbd to aapitzsch

Changed 12 years ago by aapitzsch

Attachment: evaluate_1145.sage added

comment:13 Changed 12 years ago by aapitzsch

Cc: Jeroen Demeyer added
Status: newneeds_review

Apply first: #5945, #5310

Patch makes use of SQUFOF, ecmfactor, cunningham_table and sympy.factorint.

To pass all tests cunningham database should become a standard package or the warning in cunningham_tables.py should be removed.

To evalutate run attachment:evaluate_1145.sage

This should also fix #9463.

comment:14 Changed 12 years ago by Jeroen Demeyer

I have personally been considering moving all the integer factorisation code to a new module. I think it makes sense because a lot of new code is being added and sage/rings/integer.pyx is already large enough.

What do you think?

comment:15 Changed 12 years ago by Robert Bradshaw

+1 to moving all the code to a new factor module. I've just very briefly looked at your code. Also, in terms of code structure, there's no need for all the functions to start with an underscore, and it's probably worth making them cpdef functions for ease of testing. (There's no overhead for module-level methods, as they can't be overridden, so the cdef portion is just as fast.) How does sympy.factorint compare to, say, pari's sieves? What about the quadratic sieve from flint?

comment:16 Changed 12 years ago by aapitzsch

Okay, I will move the code to a new module. How should it be called? Should it stay in sage/rings?

sympy.factorint becomes slow very fast but it can handle perfect powers better than other functions.

quadratic sieve from flint is currently not installed, so I couldn't use it.

comment:17 in reply to:  16 Changed 12 years ago by Jeroen Demeyer

Replying to aapitzsch:

Okay, I will move the code to a new module. How should it be called? Should it stay in sage/rings?

sympy.factorint becomes slow very fast but it can handle perfect powers better than other functions.

Personally, I wouldn't care that much about huge perfect powers. I agree with the PARI people that this is essentially a non-issue.

comment:18 in reply to:  16 Changed 12 years ago by Jeroen Demeyer

Replying to aapitzsch:

Okay, I will move the code to a new module. How should it be called? Should it stay in sage/rings?

I would call it sage/rings/factorint.pyx or so. It would be nice if you would move *all* the integer factoring code there, not only the newly added code.

I would love to help with this ticket, but I'm already involved in too many tickets...

comment:19 Changed 12 years ago by aapitzsch

Perfect power checking doesn't cost that much but it improves considerably the speed if it's a perfect power.

comment:20 in reply to:  19 Changed 12 years ago by Jeroen Demeyer

Replying to aapitzsch:

Perfect power checking doesn't cost that much but it improves considerably the speed if it's a perfect power.

Trial division doesn't cost that much but it improves considerably the speed if it has lots of small factors.

Now guess which is more likely for a random number...

comment:21 Changed 12 years ago by Jeroen Demeyer

Besides, if you want to check for a power, just check for a perfect power. No need to call sympy.factorint() for that.

comment:22 Changed 12 years ago by aapitzsch

That's what I do.

if n.is_power(): 
    from sympy import factorint 
    return [(Integer(p),e) for p,e in factorint(n).items()]

comment:23 Changed 12 years ago by Jeroen Demeyer

That doesn't look good. If n is a power, you should write n as b^k and then factor b and multiply all exponents by k instead of using the slower sympy.factorint().

comment:24 in reply to:  16 Changed 12 years ago by Robert Bradshaw

Replying to aapitzsch:

Okay, I will move the code to a new module. How should it be called? Should it stay in sage/rings?

Yes, that'd be a fine place for it.

sympy.factorint becomes slow very fast but it can handle perfect powers better than other functions.

Then -1 from moving away from Pari. We can check for perfect powers very quickly using gmp/mpir first.

quadratic sieve from flint is currently not installed, so I couldn't use it.

Well, expect some followup patches then :).

  • Robert

comment:25 Changed 12 years ago by aapitzsch

I moved the integer factorization code to factorint.pyx. The patch is attached to #5945 because adding factor_using_flint is a minor change than this one, so review should be easier.

Currently arith.factor links to factorint.factor. We can create a ticket to get rid of it after fixing #5945. But for now it would be to much I think.

comment:26 Changed 12 years ago by aapitzsch

Status: needs_reviewneeds_work

Changed 12 years ago by aapitzsch

comment:27 Changed 12 years ago by aapitzsch

Description: modified (diff)
Status: needs_workneeds_review

I changed the things mentioned above.

comment:28 Changed 12 years ago by Paul Zimmermann

if somebody wants to add GNFS to the list of algorithms available from Sage, CADO-NFS 1.0 was just released: http://cado-nfs.gforge.inria.fr/

comment:29 Changed 12 years ago by Robert Bradshaw

Depends on #5310 and #5945

Apply trac_1145_integer_factorization.patch and trac_1145_cunningham_warning.patch

comment:30 Changed 12 years ago by aapitzsch

Description: modified (diff)
Milestone: sage-wishlistsage-4.6.2

Changed 12 years ago by aapitzsch

comment:31 Changed 11 years ago by Paul Zimmermann

Yafu (http://sourceforge.net/projects/yafu/) might be an interesting software to include, although I have difficulties finding its license. It includes a multi-thread implementation of SIQS, which might be interesting in the 70-100 digit range.

Paul Zimmermann

comment:32 Changed 11 years ago by David Roe

Status: needs_reviewneeds_work

I know that it's waiting on #5310, but your patch is doubled: you exported to a file that already contained the patch. Also, we should figure out how to merge these factoring improvements with those in #12125 and #12133; it shouldn't be too hard.

A couple other comments: you should use the perfect_power method of Integers rather than writing your own base_exponent, and you should pass on the use_pollard parameter when you recurse.

comment:33 Changed 10 years ago by Jean-Pierre Flori

Cc: Jean-Pierre Flori added

comment:34 Changed 9 years ago by Marco Streng

See also #14970

comment:35 Changed 9 years ago by Jeroen Demeyer

Milestone: sage-5.11sage-5.12

comment:36 Changed 9 years ago by For batch modifications

Milestone: sage-6.1sage-6.2

comment:37 Changed 8 years ago by For batch modifications

Milestone: sage-6.2sage-6.3

comment:38 Changed 8 years ago by For batch modifications

Milestone: sage-6.3sage-6.4
Note: See TracTickets for help on using tickets.