Opened 11 years ago
Closed 11 years ago
#11657 closed defect (fixed)
the vector(...) function is extremely slow
Reported by: | William Stein | Owned by: | jason, was |
---|---|---|---|
Priority: | minor | Milestone: | sage-4.7.2 |
Component: | linear algebra | Keywords: | sd32 |
Cc: | Merged in: | sage-4.7.2.alpha3 | |
Authors: | William Stein, Rob Beezer | Reviewers: | Rob Beezer, William Stein |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
I was shocked by this:
sage: timeit('vector(ZZ,100)') 625 loops, best of 3: 302 µs per loop sage: timeit('(ZZ^100)(0)') 625 loops, best of 3: 24.9 µs per loop sage: timeit('(ZZ^100).zero_vector()') 625 loops, best of 3: 21.8 µs per loop
I didn't realize that the special case of the zero vector is incredibly slow for the vector
function. This needs to be fixed.
Apply:
Attachments (3)
Change History (12)
comment:1 Changed 11 years ago by
Changed 11 years ago by
Attachment: | trac_11657-zero-vector-speedup.patch added |
---|
comment:2 Changed 11 years ago by
Authors: | → Rob Beezer |
---|---|
Description: | modified (diff) |
Status: | new → needs_review |
Patch generally improves speed of conveniences (constructors) for creating zero vectors. Fastest route seems to usually be the .zero_vector()
method of a module, which just barely beats coercing a zero scalar into the module most of the time.
The zero_vector()
*constructor function* now uses the .zero_vector()
*method*, and the vector()
method short-circuits to return a zero vector just as soon as possible. The multi-format capabilities of the vector()
constructor require some necessary overhead, which seems to be a constant 30 micro-seconds on my machine.
Additions to documentation provide advice for the truly speed-hungry.
4.7.1, without patch:
sage: n = 1000 sage: timeit("(ZZ^n).zero_vector()") 625 loops, best of 3: 69.9 µs per loop sage: timeit("(ZZ^n)(0)") 625 loops, best of 3: 74.6 µs per loop sage: timeit("zero_vector(ZZ, n)") 125 loops, best of 3: 2.55 ms per loop sage: timeit("vector(ZZ, n)") 125 loops, best of 3: 2.52 ms per loop sage: n = 10000 sage: timeit("(ZZ^n).zero_vector()") 625 loops, best of 3: 613 µs per loop sage: timeit("(ZZ^n)(0)") 625 loops, best of 3: 617 µs per loop sage: timeit("zero_vector(ZZ, n)") 25 loops, best of 3: 24.8 ms per loop sage: timeit("vector(ZZ, n)") 25 loops, best of 3: 24.9 ms per loop
4.7.1, with patch:
sage: n = 1000 sage: timeit("(ZZ^n).zero_vector()") 625 loops, best of 3: 73.9 µs per loop sage: timeit("(ZZ^n)(0)") 625 loops, best of 3: 77.2 µs per loop sage: timeit("zero_vector(ZZ, n)") 625 loops, best of 3: 78.1 µs per loop sage: timeit("vector(ZZ, n)") 625 loops, best of 3: 109 µs per loop sage: n = 10000 sage: timeit("(ZZ^n).zero_vector()") 625 loops, best of 3: 624 µs per loop sage: timeit("(ZZ^n)(0)") 625 loops, best of 3: 621 µs per loop sage: timeit("zero_vector(ZZ, n)") 625 loops, best of 3: 624 µs per loop sage: timeit("vector(ZZ, n)") 625 loops, best of 3: 654 µs per loop
comment:3 Changed 11 years ago by
Your examples above, with large n, are probably unfair, since they are much less likely. I'm still concerned by how very slow this is with smaller n, e.g., even with your patch:
sage: timeit('vector(ZZ,50)') 625 loops, best of 3: 44.3 µs per loop sage: timeit('(ZZ^50).zero_vector()') 625 loops, best of 3: 9.58 µs per loop
I'm posting a new version of your patch that is pretty much "optimal":
sage: timeit('vector(ZZ,50)') 625 loops, best of 3: 9.7 µs per loop
Changed 11 years ago by
Attachment: | trac_11657-rewrite.patch added |
---|
apply only this; total rewrite of the previous patch
comment:4 follow-up: 5 Changed 11 years ago by
Now Rob (say) has to review this new patch. It significantly speeds up vector and zero_vector. It also moves some error checking to FreeModule?, where it belongs.
comment:5 Changed 11 years ago by
Status: | needs_review → needs_info |
---|
Replying to was:
Now Rob (say) has to review this new patch.
Yes, I've got it. Looks good and nearly optimal. Nothing much to add in the way of performance.
Comments:
Line 127 - typo in comments "slowedue"
Line 447 - return statement with zero vector should now be dead code - at least we want it to be. Commenting-it out and running tests confirms. The comment preceding needs adjustment.
Line 665 - is the whole discussion about "or" now moot with the super-fast is_Ring()
defined earlier?
Line 670 - error message: can we say "first argument must be a ring" rather than "arg0 must be a ring"?
Reactions? I can make a follow-up patch for you to review, or vice-versa?
Changed 11 years ago by
Attachment: | trac_11657-zero-vector-edits.patch added |
---|
comment:6 Changed 11 years ago by
Authors: | Rob Beezer → William Stein, Rob Beezer |
---|---|
Description: | modified (diff) |
Reviewers: | → Rob Beezer, William Stein |
Status: | needs_info → needs_review |
"rewrite" patch has positive review from me.
"edits" patch addresses all four comments above. Needs review.
Combination of two patches passes all tests. Timings are now very similar for all four ways to get a zero vector - within about 3 microseconds difference in overhead.
comment:7 Changed 11 years ago by
Keywords: | sd32 added |
---|
comment:8 Changed 11 years ago by
Status: | needs_review → positive_review |
---|
Looks great -- thanks for the cleanup!
comment:9 Changed 11 years ago by
Merged in: | → sage-4.7.2.alpha3 |
---|---|
Resolution: | → fixed |
Status: | positive_review → closed |
The vector constructor has to do a fair amount of checking to accomodate all of its flexibility. There is a numpy type check that could probably go away due to a subsequent call to Sequence() in prepare(), though it does not appear this is the problem (verbose() says 0.0 seconds, is it granular enough?).
I think this Sequence call is used to find a common parent for the types of the entries. However, it is about 1/3 of the time.
It would be possible to have the routine return the zero vector sooner, since there is no type checking involved. With a crude fix that passes all tests in sage/matrix, sage/modules, there is an improvement, but still slower.
Trouble is we need to support two-argument calls like
and so need to do some type-checking on the arguments, though maybe it could be reduced slightly.