Opened 11 years ago
Closed 10 years ago
#10628 closed enhancement (fixed)
initialization of matrices from vectors or list of lists can be way faster
Reported by: | mderickx | Owned by: | jason, was |
---|---|---|---|
Priority: | major | Milestone: | sage-5.0 |
Component: | linear algebra | Keywords: | |
Cc: | rbeezer, jason | Merged in: | sage-5.0.beta3 |
Authors: | Maarten Derickx, Simon King | Reviewers: | Simon King |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
This is demonstrated by the code below:
sage: M=MatrixSpace(GF(46337),400) sage: m=M.random_element() sage: m1=m.columns() sage: time M(m1) CPU times: user 3.06 s, sys: 0.44 s, total: 3.49 s Wall time: 3.70 s 400 x 400 dense matrix over Finite Field of size 46337 sage: x=[] sage: time map(x.extend,m1) CPU times: user 0.81 s, sys: 0.02 s, total: 0.83 s sage: time M(x) CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s Wall time: 0.01 s 400 x 400 dense matrix over Finite Field of size 46337 sage: M(x)==M(m1) True
Apply:
Attachments (2)
Change History (29)
comment:1 Changed 11 years ago by
- Status changed from new to needs_review
comment:2 Changed 11 years ago by
Based on a quick look - two questions and a suggestion.
- Does this address the complaint at #10312? Fully, or partially?
- There is a command
is_Vector()
. Or do we just care thatx[0]
(and the rest of x) is made up of iterables? Would anything that can list itself qualify? Might there be a better test?
If you name your patch so it ends in ".patch" then Trac will give a nice red/green view of it in a web browser. Without, its just the raw patch test.
Thanks, Rob
comment:3 Changed 11 years ago by
- It adresses the complaint at 10312 partially, it's not as fast as their suggestion. But at least a lot faster.
0 0.000539 0.000325 50 0.031196 0.026185 100 0.065819 0.051949 150 0.199419 0.080579 200 0.244128 0.108761 250 0.389423 0.137844 300 0.337158 0.165608 350 0.473639 0.194574 400 0.521386 0.220917 450 0.676239 0.249057 500 0.620335 0.277434 550 0.755271 0.307235 600 0.942296 0.333725 650 0.881207 0.364136 700 1.050021 0.393484 750 1.101895 0.4192 800 1.143609 0.447968 850 1.348676 0.476717 900 1.244891 0.504382 950 1.557877 0.534018 1000 1.497431 0.572604}}} Didn't look at sparse matrices. 2. I didn't think about the testing, just left it as it was, since I just wanted the code to be faster, not to behave in different way. If you have suggestions for better testing I'm willing to add it to the patch, but still not feeling about figuring out the best way to test myself.
comment:4 Changed 11 years ago by
By the way, the best way to do solve these tickets is probably to write a cython "matrix_from_list_of_iterables" command and call that from the vector initialization, but I don't know if that would be worth it since most methods done on matrices will still be way slower than initializing them. I.e. this patch makes sure that initializing is no longer the bottle neck.
comment:5 Changed 11 years ago by
- Cc rbeezer added
comment:6 follow-up: ↓ 7 Changed 11 years ago by
Well, the best thing to do would be to rewrite the entire matrix construction routines. ;-) I've made lots of changes to the vector constructor recently, but the matrix constructor always looks like a much bigger job.
The is_vector
attribute check is pretty bad - an object could have that method and the method could return False
. But I can understand the desire to just change small parts (which is fine). That said, is it safe to remove copy = False
?
I'll run some tests overnight.
Rob
comment:7 in reply to: ↑ 6 Changed 11 years ago by
Tests ran fine. Even so, I can't see that it is safe to drop the Copy = False
line. It could be, I just can't chase my way through to be sure.
I'd feel a lot better about giving this a positive review if it were still there.
Rob
comment:8 Changed 11 years ago by
Since I combined the three different cases into one I thought it was saver to drop the Copy = False line. Making a copy when not really needed is saver then the other way around, it does increase the peak memory usage and the runtime tough. Since in the for loop we already create an entirely new list it will be safe to add copy=False for all three cases. If you think that change is ok I will add it. Else I will just add the copy=false in the vector case.
comment:9 Changed 11 years ago by
- Status changed from needs_review to needs_work
If you look at the init method of Matrix_generic_dense
, you see that the given list of entries is simply assigned to the attribute self._entries
, if copy is False, and a copy of the list of entries is assigned to self._entries
otherwise.
Note that the init method of Matrix_generic_dense
actually does not use the "copy" function: The copy is constructed explicitly. I could imagine that this is not optimal, but that's a different topic.
Now, is it safe to keep "copy=False" in your patch? I think so. You create a new list, both when you transform a list of vectors resp. a list of lists into a single list of entries. So, "copy=True" would be a waste of time.
Concerning the test for the presence of "is_vector": I think one could easily do better.
Whithout your patch, what one actually used was the method "list()" of vectors (see line 28 of your patch). With your patch, you use that a vector is iterable, by doing e.extend(v) in line 30 of your patch.
If I am not mistaken, good Python tradition is to duck test using the methods that are actually needed. Hence, without your patch, it would have been better to simply try to use "v.list()", and catch the attribute error. And with your patch, it would be better to simply try "e.extend(v)", and catch the error that results if v is not iterable.
Moreover, I really think that the following should be made work:
sage: MS = MatrixSpace(GF(5),4,2) sage: MS0 = MatrixSpace(GF(5),2) sage: MS([MS0.random_element(), MS0.random_element()])
Hence, the matrix constructor should also accept lists of matrices (not just list of vectors), and should automatically stack the matrices from the list.
The problem with duck typing via "list()" is that we could have matrices over a ring whose elements have a list() method. In that case, we would not want to use list() in the matrix constructor.
Perhaps one could proceed as follows:
if isinstance(x, list) and len(x) > 0: # If the list has the expected length, then # we are likely to have a list of elements of # the base ring if len(x) < self.__ncols*self.__nrows: e = [] for v in x: try: e.extend(v.list()) except AttributeError: e.extend(v) x = e # x is a new list, hence, copy=False is OK copy = False # Now, x presumably is a list of ring elements. if not rows: new_x = [] for k in xrange(len(x)): i = k % self.__ncols j = k // self.__ncols new_x.append( x[ i*self.__nrows + j ] ) x = new_x return self.__matrix_class(self, entries=x, copy=copy, coerce=coerce)
Do you think that my idea makes sense? It would allow to define a matrix by a list of (lists, matrices, vectors) -- the three could actually be combined.
I have to leave office now, but I hope that I'll find time to create a patch later or tomorrow.
comment:10 Changed 11 years ago by
- Priority changed from minor to major
- Status changed from needs_work to needs_review
I have attached a patch with my suggested code (with some minor modifications) and a new doc test.
Also I can provide more timings. They show that the second patch not only provides more flexibility but also a mild speedup on top of what is achieved by the first patch.
Here are the timings on my computer for the examples you gave, with both, only the first, and no patch, respectively (I considered the wall time in each case):
sage: M=MatrixSpace(GF(46337),400) sage: m=M.random_element() sage: m1=m.columns() sage: time M(m1) 400 x 400 dense matrix over Finite Field of size 46337 Time: CPU 0.68 s, Wall: 0.67 s // without second patch: 0.72 s // without patches: 1.46 s sage: M=MatrixSpace(GF(46337),400) sage: m=M.random_element() sage: m1=m.columns() sage: time M(m1) 400 x 400 dense matrix over Finite Field of size 46337 Time: CPU 0.66 s, Wall: 0.66 s // without second patch: 0.66 s // without patches: 1.38 s
The ticket description mentions the creation from list of lists, but there were no timings, so far.
sage: MS = MatrixSpace(GF(5),1000) sage: L = 1000*[[GF(5)(i) for i in range(1000)]] sage: %time M = MS(L) CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s Wall time: 0.03 s // without second patch: 0.04 s // without both patches: 5.14 s sage: sum(L,[]) == M.list() True
If a matrix is created from a plain list (not list of lists), the first patch had a slow down. The second patch brings it back to the original speed:
sage: MS = MatrixSpace(GF(5),4000) # Matrix creation by a single list sage: L = 4000*[GF(5)(i) for i in range(4000)] sage: %time M = MS(L) CPU times: user 0.20 s, sys: 0.02 s, total: 0.22 s Wall time: 0.23 s // without second patch: 0.38 s // without both patches: 0.21 s
Here are the new doc tests, that would fail without the second patch:
sage: MS = MatrixSpace(ZZ,4,2) sage: MS0 = MatrixSpace(ZZ,2) A list of matrices:: sage: MS.matrix([MS0([1,2,3,4]), MS0([5,6,7,8])]) [1 2] [3 4] [5 6] [7 8] A mixed list of matrices and vectors:: sage: MS.matrix( [MS0([1,2,3,4])] + list(MS0([5,6,7,8])) ) [1 2] [3 4] [5 6] [7 8]
Last, allow me to increase the priority of the ticket. An improvement from more than 5 seconds to less than 0.05 seconds is not "minor".
comment:11 Changed 11 years ago by
Concerning review: I am running full long doctests now (so far, I only did the tests from sage/matrix).
As much as I know, cross-reviewing is OK. So, I could give the first patch a review (which will be positive, if all tests pass), and someone else could give my patch a review.
comment:12 Changed 11 years ago by
- Status changed from needs_review to needs_work
Too bad. I got a failure in devel/sage-main/doc/en/bordeaux_2008/elliptic_curves.rst. That failure does not occur with only the first patch.
Hence, back to the drawing board...
comment:13 follow-up: ↓ 14 Changed 11 years ago by
Cool, thanks for showing interest in this ticket. I will try to look at your work if you have a updated patch witch passes the doctests.
comment:14 in reply to: ↑ 13 Changed 11 years ago by
Replying to mderickx:
Cool, thanks for showing interest in this ticket. I will try to look at your work if you have a updated patch witch passes the doctests.
I see what happened.
I test whether the length of x coincides with the number of entries of the matrix. If it does, then it is assumed that it is a list of ring elements.
The error occurs if the matrix space has one row and one column. If the input is, say, [[0]]
, then the length of that list is one. Hence, with my patch, it is directly forwarded to the init method of the matrix class -- but it should send [0]
to the init method.
Fixing it now, then repeating the timings...
comment:15 Changed 11 years ago by
- Milestone set to sage-4.7.2
The corner case is really annoying. It occurs if the number of columns is 1 -- because that means that both [1,2,3,...]
and [[1],[2],[3],...]
are valid input.
So, it was wrong that I used the length of the list as indicator of what happens.
comment:16 Changed 11 years ago by
There is one more problem with duck typing. Unfortunately, NumberFieldElement_quadratic
has the attribute list()
. Hence, my patch will systematically fail on matrices over (quadratic) number fields.
So, it is more than just a corner case.
comment:17 Changed 11 years ago by
- Status changed from needs_work to needs_review
I updated my patch. Now, the doctests in sage/matrix, sage/schemes/elliptic_curves and doc/ pass.
Let me summarise the problem:
- I think it should be possible to create a matrix from a list of vectors or matrices; actually it should be possible to combine both.
- The old "ducktyping" of vectors is not suitable for matrices, because they don't have a method "is_vector()".
- Both vectors and matrices have a method "list()", and that is exactly what is used in the pre-processing of data. Hence, why not just duck-type via "list()"?
- Problem: Number field elements have that method as well. If v is a number field element, then v.list() returns a pair of numbers - that is not what we want.
My new patch works as follows:
- Preprocessing is needed if the input is not a plain list of elements. If the given list is too short or if the number of columns is one, then it is possible that the given list contains something that is not just a ring element (e.g., a vector, a matrix, or a list). That is line 1231 of my patch.
- Since pure duck typing turned out to be error prone, I test in line 1237 whether we have a list or tuple using "isinstance", in which case we extend our list of entries.
- Otherwise, duck typing is used to catch vectors and matrices. It turns out that both vectors and matrices have an attribute "row()" and an attribute "list()", whereas number field elements have no "row()". If we have list/vector, then we extend our list of entries by "v.list()".
- Otherwise, we suppose that we have an element of the base ring (or something that will eventually be converted into the base ring - that case is subject to a new doc test), and append it to our list of entries.
The doc test problems concerning number field elements vanished with the new patch version. Moreover, the speed is still fine. I am running all long doc tests now, and return to "needs review".
comment:18 Changed 11 years ago by
- Description modified (diff)
- Reviewers set to Simon King
All long tests both in devel/sage/sage and devel/sage/doc pass. From that point of view, and since it brings quite a speed-up, I can give the first patch a positive review. I hope that someone else can review my patch as well.
comment:19 Changed 10 years ago by
We have two patches. Maarten's patch got a positive review from me. My patch still needs a review from someone (most naturally from Maarten...).
comment:20 Changed 10 years ago by
One of the patches still needs a review, even though sage days is now over...
comment:21 Changed 10 years ago by
- Status changed from needs_review to needs_work
I wanted to start and see if all doctest pass, but it needs to be rebased first. I will do this, this evening.
mderickx@sage:/mnt/usb1/scratch/mderickx/sage-4.7.2.alpha2/devel/sage$ sage -hg qim -P http://trac.sagemath.org/sage_trac/raw-attachment/ticket/10628/10628-speedup-matrix-creation adding 10628-speedup-matrix-creation to series file applying 10628-speedup-matrix-creation patching file sage/matrix/matrix_space.py Hunk #2 succeeded at 1225 with fuzz 2 (offset 106 lines). now at: 10628-speedup-matrix-creation mderickx@sage:/mnt/usb1/scratch/mderickx/sage-4.7.2.alpha2/devel/sage$ sage -hg qim -P http://trac.sagemath.org/sage_trac/raw-attachment/ticket/10628/10628-stacked-matrix-creation.patch adding 10628-stacked-matrix-creation.patch to series file applying 10628-stacked-matrix-creation.patch patching file sage/matrix/matrix_space.py Hunk #1 FAILED at 1160 Hunk #2 FAILED at 1177 Hunk #3 FAILED at 1189 3 out of 3 hunks FAILED -- saving rejects to file sage/matrix/matrix_space.py.rej patch failed, unable to continue (try -v) patch failed, rejects left in working dir errors during apply, please fix and refresh 10628-stacked-matrix-creation.patch
comment:22 Changed 10 years ago by
- Status changed from needs_work to needs_review
I have replaced the second patch. I hope that it now applies cleanly, so that the review can finally be finished!
comment:23 Changed 10 years ago by
- Cc jason added
comment:24 Changed 10 years ago by
I just ran doctestst and
---------------------------------------------------------------------- All tests passed! Total time for all tests: 667.4 seconds
comment:25 Changed 10 years ago by
I've read trough to code and everything looks sensible to me. There was a problem with reproducing the slowdown that my patch introces. On sage.math.washington.edu (with sage.4.8.alpha4) I get something between 0.30 and 0.33 seconds for the following test:
sage: MS = MatrixSpace(GF(5),4000) # Matrix creation by a single list sage: L = 4000*[GF(5)(i) for i in range(4000)] sage: %time M = MS(L)
And that timing stays the same with either my, or both patches applied.
But since the timing is only not reproducible for my patch, this ticket can still have positive review.
Changed 10 years ago by
comment:26 Changed 10 years ago by
- Description modified (diff)
- Status changed from needs_review to positive_review
I noticed that I didn't actually put it to positive review. I tested if the patches applied to the new 5.0.beta1, and they did so with fuzz. I rebased them and they now apply without fuzz, all tests still pass hence now really a positive review.
comment:27 Changed 10 years ago by
- Merged in set to sage-5.0.beta3
- Resolution set to fixed
- Status changed from positive_review to closed
timing with the new code: