Opened 15 years ago
Closed 15 years ago
#2170 closed enhancement (fixed)
[with patch, with negative review] sage's integer.pyx digits function sucks in the base 10 case
Reported by: | was | Owned by: | somebody |
---|---|---|---|
Priority: | major | Milestone: | sage-3.0 |
Component: | basic arithmetic | Keywords: | |
Cc: | Merged in: | ||
Authors: | Reviewers: | ||
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description
The digits function should be rewritten to have a special case in the base 10 case, since as illustrated below it is currently WAY slower than just doing str(...) on the input number!
sage: a = 3^100000 sage: time w = a.digits(base=10) CPU times: user 1.00 s, sys: 0.01 s, total: 1.01 s Wall time: 1.01 sage: time v = list(reversed(str(a))) CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s Wall time: 0.02 sage: v[:10] ['1', '0', '0', '0', '0', '0', '2', '2', '5', '5'] sage: w[:10] [1, 0, 0, 0, 0, 0, 2, 2, 5, 5]
Attachments (3)
Change History (14)
comment:1 Changed 15 years ago by
Summary: | sage's integer.pyx digits function sucks in the base 10 case → [with patch, needs review] sage's integer.pyx digits function sucks in the base 10 case |
---|
Changed 15 years ago by
Attachment: | 8312.patch added |
---|
comment:2 Changed 15 years ago by
Summary: | [with patch, needs review] sage's integer.pyx digits function sucks in the base 10 case → [with patch, negative review] sage's integer.pyx digits function sucks in the base 10 case |
---|
REVIEW: The patch needs to be changed to only use the mpz string trick when the base is small enough to work with GMP. Right now we have:
Before patch:
sage: a = 9939082340 sage: a.digits(512) [100, 302, 26, 74]
After patch:
sage: a = 9939082340 sage: a.digits(10) ['0', '4', '3', '2', '8', '0', '9', '3', '9', '9'] sage: a.digits(32) ['4', '3', 'n', 'k', '6', '8', '9'] sage: a.digits(62) ['g', 'L', 'N', 'd', 'q', 'A'] sage: a.digits(63) ------------------------------------------------------------ Unhandled SIGBUS: A bus error occured in SAGE. This probably occured because a *compiled* component of SAGE has a bug in it (typically accessing invalid memory) or is not properly wrapped with _sig_on, _sig_off. You might want to run SAGE under gdb with 'sage -gdb' to debug this. SAGE will now terminate (sorry). ------------------------------------------------------------
comment:3 Changed 15 years ago by
l = <object> PyString_FromString(str_out)
could simply read
l = str_out
and Cython will do the string conversion for you. This looks like it was copied from Integer.str()
and should be changed there as well.
It certainly needs to be fixed to handle the base > 62 case, and this patch changes the semantics of the function as well, so I don't think it should be applied as is.
Changed 15 years ago by
Attachment: | 8312.2.patch added |
---|
comment:4 Changed 15 years ago by
Summary: | [with patch, negative review] sage's integer.pyx digits function sucks in the base 10 case → [with revised patch, needs new review] sage's integer.pyx digits function sucks in the base 10 case |
---|
Attached is a revised patch, fixing the issues reported by the reviewers (whom we thank):
sage: a = 9939082340 sage: a.digits(512) [100, 302, 26, 74] sage: a.digits(10) [0, 4, 3, 2, 8, 0, 9, 3, 9, 9] sage: a = 3^100000 sage: time w = a.digits(base=10) CPU times: user 0.12 s, sys: 0.00 s, total: 0.12 s Wall time: 0.11 sage: w[:10] [1, 0, 0, 0, 0, 0, 2, 2, 5, 5]
We also made the change suggested by Robert (in Integer.str() too). The new digits() function is faster for 10 <= base <= 62, but not as fast as in the 1st (wrong) patch; the bottleneck seems to be the map(make_integer, l) call.
comment:5 Changed 15 years ago by
Attached is a revised patch,
This patch seems to be identical to the original patch.
Changed 15 years ago by
Attachment: | 8683.patch added |
---|
comment:6 Changed 15 years ago by
This patch seems to be identical to the original patch.
Sorry for that. The new one (8683.patch) should be the correct one.
comment:7 Changed 15 years ago by
I applied 8312.patch and 8683.patch (in that order) to a vanilla 2.10.2. I may be losing my mind, but it sure appears to me that the patched version is actually *slower* than 2.10.2. I would also point out that there is no point in declaring 'l' as a 'cdef object' -- that's the cython default. I also think that " = PyList_New(0)" allocates a python list that is never used (just overwritten later).
As to the original bug, it seems quite intrinsic to me that digits would be slower than str. The digits method needs to create individual python objects for each and every digit, str does not. There's a frikkin lot of memory allocation going on there. With that in mind, I'm not at all convinced that the bug is even fixable at all.
comment:8 Changed 15 years ago by
Summary: | [with revised patch, needs new review] sage's integer.pyx digits function sucks in the base 10 case → [with patch, with conditionally positive review] sage's integer.pyx digits function sucks in the base 10 case |
---|
Oh, I see, upon more reflection. I see that my idea of "big" wasn't quite big enough. Once, we have a really big number, this patch becomes effective.
Here's what I observe for moderate numbers:
# vanilla 2.10.2 sage: a = 28356982659^15 sage: %timeit w = a.digits(base=10) 10000 loops, best of 3: 67 µs per loop
# Patched version sage: a = 28356982659^15 sage: %timeit w = a.digits(base=10) 10000 loops, best of 3: 136 µs per loop
I do see that my memory allocation comment is entirely incorrect though. I'm still a bit disturbed though. Why should we suddenly get slower at 63? Can't we extract gmp's algorithm for larger bases? Part of the reason for my questions is that I want to use the digits method for mpoly factoring and it has some issues with large bases.
Modulo my comments above about the variable 'l', I think this is a good patch for the moment and should be included. I do think there is a better answer in the long-term for larger bases though.
comment:9 Changed 15 years ago by
Summary: | [with patch, with conditionally positive review] sage's integer.pyx digits function sucks in the base 10 case → [with patch, with negative review] sage's integer.pyx digits function sucks in the base 10 case |
---|
Well, I guess I should've waited to post the first two after I thought about this a bit more. :)
I now have some serious reservations about this patch:
1) It actually makes small cases slower. (See above)
2) It doesn't handle negatives correctly (what is correct?) -- we need a doc-test to clarify intent here.
sage: n=-20385 sage: n.digits(100) # this is what used to occur with negatives [-85, -3, -2] sage: n.digits(10) # the last zero comes from '-' in the string? [5, 8, 3, 0, 2, 0]
3) The version that runs with a non-None digits parameter is still slow.
4) A big base (bigger than 62) is still slow.
5) I now have cold feet about the 'PyMem_Free'. Shouldn't it be 'free' since presumably gmp allocated with 'malloc'? Or, did we hack gmp? Or, doesn't it matter?
Now, this raises some questions in my mind. What should happen with negatives when digits are specified?
comment:10 Changed 15 years ago by
With apologies to zimmerma and Laurent Imbert for stealing their thunder, I attached a patch to #2362 which fixes that bug and this bug.
comment:11 Changed 15 years ago by
Resolution: | → fixed |
---|---|
Status: | new → closed |
As Joel pointed out in IRC this ticket can be closed as fixed since his patches at #2363 [that have been merged] fix the issue.
Cheers,
Michael
Together with Laurent Imbert, we propose the attached patch. This solves the case where the base is >= 10.