Opened 10 years ago

Closed 9 years ago

#10628 closed enhancement (fixed)

initialization of matrices from vectors or list of lists can be way faster

Reported by: mderickx Owned by: jason, was
Priority: major Milestone: sage-5.0
Component: linear algebra Keywords:
Cc: rbeezer, jason Merged in: sage-5.0.beta3
Authors: Maarten Derickx, Simon King Reviewers: Simon King
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Status badges

Description (last modified by mderickx)

This is demonstrated by the code below:

sage: M=MatrixSpace(GF(46337),400)
sage: m=M.random_element()
sage: m1=m.columns()
sage: time M(m1)
CPU times: user 3.06 s, sys: 0.44 s, total: 3.49 s
Wall time: 3.70 s
400 x 400 dense matrix over Finite Field of size 46337
sage: x=[]
sage: time map(x.extend,m1)
CPU times: user 0.81 s, sys: 0.02 s, total: 0.83 s
sage: time M(x)
CPU times: user 0.01 s, sys: 0.00 s, total: 0.01 s
Wall time: 0.01 s
400 x 400 dense matrix over Finite Field of size 46337
sage: M(x)==M(m1)
True

Apply:

  1. 10628-speedup-matrix-creation
  2. 10628-stacked-matrix-creation.patch

Attachments (2)

10628-stacked-matrix-creation.patch (4.0 KB) - added by SimonKing 10 years ago.
Allow mixed lists of lists/vectors/matrices in matrix creation
10628-speedup-matrix-creation (1.6 KB) - added by mderickx 9 years ago.

Download all attachments as: .zip

Change History (29)

comment:1 Changed 10 years ago by mderickx

  • Status changed from new to needs_review

timing with the new code:

sage: M=MatrixSpace(GF(46337),400)
sage: m=M.random_element()
sage: m1=m.columns()
sage: time M(m1)
CPU times: user 0.83 s, sys: 0.06 s, total: 0.88 s
Wall time: 1.03 s
400 x 400 dense matrix over Finite Field of size 46337

comment:2 Changed 10 years ago by rbeezer

Based on a quick look - two questions and a suggestion.

  1. Does this address the complaint at #10312? Fully, or partially?
  1. There is a command is_Vector(). Or do we just care that x[0] (and the rest of x) is made up of iterables? Would anything that can list itself qualify? Might there be a better test?

If you name your patch so it ends in ".patch" then Trac will give a nice red/green view of it in a web browser. Without, its just the raw patch test.

Thanks, Rob

comment:3 Changed 10 years ago by mderickx

  1. It adresses the complaint at 10312 partially, it's not as fast as their suggestion. But at least a lot faster.
    0 0.000539 0.000325
    50 0.031196 0.026185
    100 0.065819 0.051949
    150 0.199419 0.080579
    200 0.244128 0.108761
    250 0.389423 0.137844
    300 0.337158 0.165608
    350 0.473639 0.194574
    400 0.521386 0.220917
    450 0.676239 0.249057
    500 0.620335 0.277434
    550 0.755271 0.307235
    600 0.942296 0.333725
    650 0.881207 0.364136
    700 1.050021 0.393484
    750 1.101895 0.4192
    800 1.143609 0.447968
    850 1.348676 0.476717
    900 1.244891 0.504382
    950 1.557877 0.534018
    1000 1.497431 0.572604}}}
    Didn't look at sparse matrices.
    
    2. I didn't think about the testing, just left it as it was, since I just wanted the code to be faster, not to behave in different way. If you have suggestions for better testing I'm willing to add it to the patch, but still not feeling about figuring out the best way to test myself.
    

comment:4 Changed 10 years ago by mderickx

By the way, the best way to do solve these tickets is probably to write a cython "matrix_from_list_of_iterables" command and call that from the vector initialization, but I don't know if that would be worth it since most methods done on matrices will still be way slower than initializing them. I.e. this patch makes sure that initializing is no longer the bottle neck.

comment:5 Changed 10 years ago by mderickx

  • Cc rbeezer added

comment:6 follow-up: Changed 10 years ago by rbeezer

Well, the best thing to do would be to rewrite the entire matrix construction routines. ;-) I've made lots of changes to the vector constructor recently, but the matrix constructor always looks like a much bigger job.

The is_vector attribute check is pretty bad - an object could have that method and the method could return False. But I can understand the desire to just change small parts (which is fine). That said, is it safe to remove copy = False?

I'll run some tests overnight.

Rob

comment:7 in reply to: ↑ 6 Changed 10 years ago by rbeezer

Tests ran fine. Even so, I can't see that it is safe to drop the Copy = False line. It could be, I just can't chase my way through to be sure.

I'd feel a lot better about giving this a positive review if it were still there.

Rob

comment:8 Changed 10 years ago by mderickx

Since I combined the three different cases into one I thought it was saver to drop the Copy = False line. Making a copy when not really needed is saver then the other way around, it does increase the peak memory usage and the runtime tough. Since in the for loop we already create an entirely new list it will be safe to add copy=False for all three cases. If you think that change is ok I will add it. Else I will just add the copy=false in the vector case.

comment:9 Changed 10 years ago by SimonKing

  • Status changed from needs_review to needs_work

If you look at the init method of Matrix_generic_dense, you see that the given list of entries is simply assigned to the attribute self._entries, if copy is False, and a copy of the list of entries is assigned to self._entries otherwise.

Note that the init method of Matrix_generic_dense actually does not use the "copy" function: The copy is constructed explicitly. I could imagine that this is not optimal, but that's a different topic.

Now, is it safe to keep "copy=False" in your patch? I think so. You create a new list, both when you transform a list of vectors resp. a list of lists into a single list of entries. So, "copy=True" would be a waste of time.

Concerning the test for the presence of "is_vector": I think one could easily do better.

Whithout your patch, what one actually used was the method "list()" of vectors (see line 28 of your patch). With your patch, you use that a vector is iterable, by doing e.extend(v) in line 30 of your patch.

If I am not mistaken, good Python tradition is to duck test using the methods that are actually needed. Hence, without your patch, it would have been better to simply try to use "v.list()", and catch the attribute error. And with your patch, it would be better to simply try "e.extend(v)", and catch the error that results if v is not iterable.

Moreover, I really think that the following should be made work:

sage: MS = MatrixSpace(GF(5),4,2)
sage: MS0 = MatrixSpace(GF(5),2)
sage: MS([MS0.random_element(), MS0.random_element()])

Hence, the matrix constructor should also accept lists of matrices (not just list of vectors), and should automatically stack the matrices from the list.

The problem with duck typing via "list()" is that we could have matrices over a ring whose elements have a list() method. In that case, we would not want to use list() in the matrix constructor.

Perhaps one could proceed as follows:

         if isinstance(x, list) and len(x) > 0:
             # If the list has the expected length, then
             # we are likely to have a list of elements of
             # the base ring
             if len(x) < self.__ncols*self.__nrows:
                 e = []
                 for v in x:
                     try:
                         e.extend(v.list())
                     except AttributeError:
                         e.extend(v)
                 x = e
                 # x is a new list, hence, copy=False is OK
                 copy = False
             # Now, x presumably is a list of ring elements.
             if not rows:
                 new_x = []
                 for k in xrange(len(x)):
                     i = k % self.__ncols
                     j = k // self.__ncols
                     new_x.append( x[ i*self.__nrows + j ] )
                 x = new_x
         return self.__matrix_class(self, entries=x, copy=copy, coerce=coerce) 

Do you think that my idea makes sense? It would allow to define a matrix by a list of (lists, matrices, vectors) -- the three could actually be combined.

I have to leave office now, but I hope that I'll find time to create a patch later or tomorrow.

comment:10 Changed 10 years ago by SimonKing

  • Authors set to Maarten Derickx, Simon King
  • Priority changed from minor to major
  • Status changed from needs_work to needs_review

I have attached a patch with my suggested code (with some minor modifications) and a new doc test.

Also I can provide more timings. They show that the second patch not only provides more flexibility but also a mild speedup on top of what is achieved by the first patch.

Here are the timings on my computer for the examples you gave, with both, only the first, and no patch, respectively (I considered the wall time in each case):

sage: M=MatrixSpace(GF(46337),400)
sage: m=M.random_element()
sage: m1=m.columns()
sage: time M(m1)
400 x 400 dense matrix over Finite Field of size 46337
Time: CPU 0.68 s, Wall: 0.67 s
// without second patch: 0.72 s
// without patches: 1.46 s

sage: M=MatrixSpace(GF(46337),400)
sage: m=M.random_element()
sage: m1=m.columns()
sage: time M(m1)
400 x 400 dense matrix over Finite Field of size 46337
Time: CPU 0.66 s, Wall: 0.66 s
// without second patch: 0.66 s
// without patches: 1.38 s

The ticket description mentions the creation from list of lists, but there were no timings, so far.

sage: MS = MatrixSpace(GF(5),1000)
sage: L = 1000*[[GF(5)(i) for i in range(1000)]]
sage: %time M = MS(L)
CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s
Wall time: 0.03 s
// without second patch: 0.04 s
// without both patches: 5.14 s
sage: sum(L,[]) == M.list()
True

If a matrix is created from a plain list (not list of lists), the first patch had a slow down. The second patch brings it back to the original speed:

sage: MS = MatrixSpace(GF(5),4000)
# Matrix creation by a single list
sage: L = 4000*[GF(5)(i) for i in range(4000)]
sage: %time M = MS(L)
CPU times: user 0.20 s, sys: 0.02 s, total: 0.22 s
Wall time: 0.23 s
// without second patch: 0.38 s
// without both patches: 0.21 s

Here are the new doc tests, that would fail without the second patch:

            sage: MS = MatrixSpace(ZZ,4,2)
            sage: MS0 = MatrixSpace(ZZ,2)

        A list of matrices::

            sage: MS.matrix([MS0([1,2,3,4]), MS0([5,6,7,8])])
            [1 2]
            [3 4]
            [5 6]
            [7 8]

        A mixed list of matrices and vectors::

            sage: MS.matrix( [MS0([1,2,3,4])] + list(MS0([5,6,7,8])) )
            [1 2]
            [3 4]
            [5 6]
            [7 8]

Last, allow me to increase the priority of the ticket. An improvement from more than 5 seconds to less than 0.05 seconds is not "minor".

comment:11 Changed 10 years ago by SimonKing

Concerning review: I am running full long doctests now (so far, I only did the tests from sage/matrix).

As much as I know, cross-reviewing is OK. So, I could give the first patch a review (which will be positive, if all tests pass), and someone else could give my patch a review.

comment:12 Changed 10 years ago by SimonKing

  • Status changed from needs_review to needs_work

Too bad. I got a failure in devel/sage-main/doc/en/bordeaux_2008/elliptic_curves.rst. That failure does not occur with only the first patch.

Hence, back to the drawing board...

comment:13 follow-up: Changed 10 years ago by mderickx

Cool, thanks for showing interest in this ticket. I will try to look at your work if you have a updated patch witch passes the doctests.

comment:14 in reply to: ↑ 13 Changed 10 years ago by SimonKing

Replying to mderickx:

Cool, thanks for showing interest in this ticket. I will try to look at your work if you have a updated patch witch passes the doctests.

I see what happened.

I test whether the length of x coincides with the number of entries of the matrix. If it does, then it is assumed that it is a list of ring elements.

The error occurs if the matrix space has one row and one column. If the input is, say, [[0]], then the length of that list is one. Hence, with my patch, it is directly forwarded to the init method of the matrix class -- but it should send [0] to the init method.

Fixing it now, then repeating the timings...

comment:15 Changed 10 years ago by SimonKing

  • Milestone set to sage-4.7.2

The corner case is really annoying. It occurs if the number of columns is 1 -- because that means that both [1,2,3,...] and [[1],[2],[3],...] are valid input.

So, it was wrong that I used the length of the list as indicator of what happens.

comment:16 Changed 10 years ago by SimonKing

There is one more problem with duck typing. Unfortunately, NumberFieldElement_quadratic has the attribute list(). Hence, my patch will systematically fail on matrices over (quadratic) number fields.

So, it is more than just a corner case.

comment:17 Changed 10 years ago by SimonKing

  • Status changed from needs_work to needs_review

I updated my patch. Now, the doctests in sage/matrix, sage/schemes/elliptic_curves and doc/ pass.

Let me summarise the problem:

  • I think it should be possible to create a matrix from a list of vectors or matrices; actually it should be possible to combine both.
  • The old "ducktyping" of vectors is not suitable for matrices, because they don't have a method "is_vector()".
  • Both vectors and matrices have a method "list()", and that is exactly what is used in the pre-processing of data. Hence, why not just duck-type via "list()"?
  • Problem: Number field elements have that method as well. If v is a number field element, then v.list() returns a pair of numbers - that is not what we want.

My new patch works as follows:

  1. Preprocessing is needed if the input is not a plain list of elements. If the given list is too short or if the number of columns is one, then it is possible that the given list contains something that is not just a ring element (e.g., a vector, a matrix, or a list). That is line 1231 of my patch.
  2. Since pure duck typing turned out to be error prone, I test in line 1237 whether we have a list or tuple using "isinstance", in which case we extend our list of entries.
  3. Otherwise, duck typing is used to catch vectors and matrices. It turns out that both vectors and matrices have an attribute "row()" and an attribute "list()", whereas number field elements have no "row()". If we have list/vector, then we extend our list of entries by "v.list()".
  4. Otherwise, we suppose that we have an element of the base ring (or something that will eventually be converted into the base ring - that case is subject to a new doc test), and append it to our list of entries.

The doc test problems concerning number field elements vanished with the new patch version. Moreover, the speed is still fine. I am running all long doc tests now, and return to "needs review".

comment:18 Changed 10 years ago by SimonKing

  • Description modified (diff)
  • Reviewers set to Simon King

All long tests both in devel/sage/sage and devel/sage/doc pass. From that point of view, and since it brings quite a speed-up, I can give the first patch a positive review. I hope that someone else can review my patch as well.

comment:19 Changed 10 years ago by SimonKing

We have two patches. Maarten's patch got a positive review from me. My patch still needs a review from someone (most naturally from Maarten...).

comment:20 Changed 10 years ago by SimonKing

One of the patches still needs a review, even though sage days is now over...

comment:21 Changed 10 years ago by mderickx

  • Status changed from needs_review to needs_work

I wanted to start and see if all doctest pass, but it needs to be rebased first. I will do this, this evening.

mderickx@sage:/mnt/usb1/scratch/mderickx/sage-4.7.2.alpha2/devel/sage$ sage -hg qim -P http://trac.sagemath.org/sage_trac/raw-attachment/ticket/10628/10628-speedup-matrix-creation
adding 10628-speedup-matrix-creation to series file
applying 10628-speedup-matrix-creation
patching file sage/matrix/matrix_space.py
Hunk #2 succeeded at 1225 with fuzz 2 (offset 106 lines).
now at: 10628-speedup-matrix-creation
mderickx@sage:/mnt/usb1/scratch/mderickx/sage-4.7.2.alpha2/devel/sage$ sage -hg qim -P http://trac.sagemath.org/sage_trac/raw-attachment/ticket/10628/10628-stacked-matrix-creation.patch
adding 10628-stacked-matrix-creation.patch to series file
applying 10628-stacked-matrix-creation.patch
patching file sage/matrix/matrix_space.py
Hunk #1 FAILED at 1160
Hunk #2 FAILED at 1177
Hunk #3 FAILED at 1189
3 out of 3 hunks FAILED -- saving rejects to file sage/matrix/matrix_space.py.rej
patch failed, unable to continue (try -v)
patch failed, rejects left in working dir
errors during apply, please fix and refresh 10628-stacked-matrix-creation.patch

Changed 10 years ago by SimonKing

Allow mixed lists of lists/vectors/matrices in matrix creation

comment:22 Changed 10 years ago by SimonKing

  • Status changed from needs_work to needs_review

I have replaced the second patch. I hope that it now applies cleanly, so that the review can finally be finished!

comment:23 Changed 10 years ago by jason

  • Cc jason added

comment:24 Changed 9 years ago by mderickx

I just ran doctestst and

----------------------------------------------------------------------
All tests passed!
Total time for all tests: 667.4 seconds

comment:25 Changed 9 years ago by mderickx

I've read trough to code and everything looks sensible to me. There was a problem with reproducing the slowdown that my patch introces. On sage.math.washington.edu (with sage.4.8.alpha4) I get something between 0.30 and 0.33 seconds for the following test:

sage: MS = MatrixSpace(GF(5),4000)
# Matrix creation by a single list
sage: L = 4000*[GF(5)(i) for i in range(4000)]
sage: %time M = MS(L)

And that timing stays the same with either my, or both patches applied.

But since the timing is only not reproducible for my patch, this ticket can still have positive review.

Changed 9 years ago by mderickx

comment:26 Changed 9 years ago by mderickx

  • Description modified (diff)
  • Status changed from needs_review to positive_review

I noticed that I didn't actually put it to positive review. I tested if the patches applied to the new 5.0.beta1, and they did so with fuzz. I rebased them and they now apply without fuzz, all tests still pass hence now really a positive review.

comment:27 Changed 9 years ago by jdemeyer

  • Merged in set to sage-5.0.beta3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.