Opened 6 years ago

Closed 2 years ago

Last modified 2 years ago

#15104 closed enhancement (fixed)

Special case modn_dense matrix operations to improve performance

Reported by: nbruin Owned by:
Priority: major Milestone: sage-8.0
Component: linear algebra Keywords: sd86.5
Cc: SimonKing, nthiery Merged in:
Authors: Nils Bruin, Simon King Reviewers: Travis Scrimshaw
Report Upstream: N/A Work issues:
Branch: 5c878c7 (Commits) Commit:
Dependencies: Stopgaps:

Description

Presently, taking a transpose of a modn_dense matrix of smallish size is much more expensive than constructing the right kernel, because the method is just a generic implementation. We can make this much faster. Same for submatrix and stack.

Attachments (1)

15104_modn_dense.patch (15.6 KB) - added by nbruin 6 years ago.
Various optimizations for modn_dense matrices (faster transpose etc.)

Download all attachments as: .zip

Change History (55)

comment:1 Changed 6 years ago by nbruin

  • Authors set to Nils Bruin
  • Status changed from new to needs_review

Straightforward special case implementation of transpose, stack, submatrix for Matrix_modn_dense_template, i.e., for Matrix_modn_dense_float and Matrix_modn_dense_double. The same optimization would probably benefit many other implementations.

Changed 6 years ago by nbruin

Various optimizations for modn_dense matrices (faster transpose etc.)

comment:2 Changed 6 years ago by nbruin

As it turns out, special casing right_kernel_matrix is also very worthwhile.

comment:3 Changed 6 years ago by nbruin

(remnants of a comment that was erroneously posted here)

Last edited 6 years ago by nbruin (previous) (diff)

comment:4 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:5 Changed 5 years ago by tscrim

  • Branch set to u/tscrim/ticket/15104
  • Commit set to b2d2f1baecf561520bbb4458d4856609546dbfe1
  • Status changed from needs_review to needs_work

Here's the git version based on 6.1, and I've made a few docstring touchups. However this currently leads to a regression. Here are my timings:

sage: M=matrix(GF(5),6,6,range(36))
sage: %timeit M.right_kernel_matrix(basis='computed')
1000 loops, best of 3: 536 us per loop
sage: %timeit M.right_kernel_matrix(basis='pivot')
1000 loops, best of 3: 556 us per loop
sage: %timeit M.transpose()
10000 loops, best of 3: 19.8 us per loop
sage: %timeit M.stack(M)
10000 loops, best of 3: 165 us per loop
sage: %timeit M.submatrix(1,1)
10000 loops, best of 3: 38.3 us per loop

Before:

sage: M=matrix(GF(5),6,6,range(36))
sage: %timeit M.right_kernel_matrix(basis='computed') 
1000 loops, best of 3: 250 us per loop
sage: %timeit M.right_kernel_matrix(basis='pivot')
10000 loops, best of 3: 114 us per loop
sage: %timeit M.transpose()
100000 loops, best of 3: 14 us per loop
sage: %timeit M.stack(M)
10000 loops, best of 3: 51.2 us per loop
sage: %timeit M.submatrix(1,1)
10000 loops, best of 3: 37.5 us per loop

New commits:

a84c802Various optimizations for modn_dense matrices (faster transpose etc.)
b2d2f1bSome reviewer docstring tweaks.

comment:6 Changed 5 years ago by nbruin

I can confirm that the branch attached here seems to lead to a performance regression relative to 6.0. I wonder what happened in between. Did empty matrix creation get better? Did cython get better? It's pretty obvious that a couple of operations here should be way faster and have virtually no overhead, whereas the generic implementations definitely do have overhead. It just seems that overhead is not as big as it used to be, making this ticket less essential.

The good news is that the performance quoted on #15113 can now be obtained on vanilla 6.0, whereas before I definitely needed the patch here. I think optimizations in the spirit of what is proposed here are still worth considering, but they neeed to be reevaluated in the light of the present state, which is happily much better than 5 months ago!

Last edited 5 years ago by nbruin (previous) (diff)

comment:7 Changed 5 years ago by nbruin

OK, staring at some profile information: it seems that in the transpose, stack, and submatrix cases, virtually all time is spent in creating the parent. It seems the self.new_matrix call is a much better way of getting a new matrix. When I change that, the methods proposed here are again considerably faster. Since new_matrix is such a generic routine itself, I think it should be possible to further save overhead on that -- but clearly it's better than the explicit parent creation the code here was using before.

comment:8 Changed 5 years ago by nbruin

  • Branch changed from u/tscrim/ticket/15104 to u/nbruin/ticket/15104
  • Created changed from 08/27/13 00:10:09 to 08/27/13 00:10:09
  • Modified changed from 02/02/14 21:08:37 to 02/02/14 21:08:37

comment:9 Changed 5 years ago by git

  • Commit changed from b2d2f1baecf561520bbb4458d4856609546dbfe1 to a908e28159a544ca33f03dcbf0e8def3cfe9a60e

Branch pushed to git repo; I updated commit sha1. New commits:

a908e28trac #15104: faster creation of new matrices (there should be even lower overhead solutions)

comment:10 Changed 5 years ago by nbruin

OK, I think I changed all the places where a new matrix is created. I think there's room for even further optimization by someone who understands the intricacies of matrix creation a little better. I'm now getting significantly better timings again. With the new branch:

sage: k=GF(17)
sage: n=20
sage: m=30
sage: M=matrix(k,n,m,[k.random_element() for j in range(n*m)])
sage: %timeit M.right_kernel_matrix(basis='computed')
10000 loops, best of 3: 88.2 us per loop
sage: %timeit M.right_kernel_matrix(basis='pivot')
10000 loops, best of 3: 91.5 us per loop
sage: %timeit M.transpose()
100000 loops, best of 3: 13.7 us per loop
sage: %timeit M.stack(M)
10000 loops, best of 3: 19.3 us per loop
sage: %timeit M.submatrix(1,1)
100000 loops, best of 3: 14.1 us per loop
sage: %timeit M.submatrix(3,nrows=5)
100000 loops, best of 3: 14 us per loop

On vanilla 6.0:

sage: k=GF(17)
sage: n=20
sage: m=30
sage: M=matrix(k,n,m,[k.random_element() for j in range(n*m)])
sage: %timeit M.right_kernel_matrix(basis='computed')
10000 loops, best of 3: 141 us per loop
sage: %timeit M.right_kernel_matrix(basis='pivot')
10000 loops, best of 3: 143 us per loop
sage: %timeit M.transpose()
10000 loops, best of 3: 22.2 us per loop
sage: %timeit M.stack(M)
10000 loops, best of 3: 50.3 us per loop
sage: %timeit M.submatrix(1,1)
10000 loops, best of 3: 37.1 us per loop
sage: %timeit M.submatrix(3,nrows=5)
10000 loops, best of 3: 22.3 us per loop

(for instance, for some example on #15113 application of this ticket means a run time of 681ms instead of 731ms. So the difference is significant)

Last edited 5 years ago by nbruin (previous) (diff)

comment:11 Changed 5 years ago by nbruin

I tried a little experiment for matrix creation. Presently, in M.__neg__() the matrix gets created via

        M = self.__class__.__new__(self.__class__, self._parent,None,None,None)

leading to

sage: k=GF(17); M=matrix(20,30,[k.random_element() for i in range(600)])
sage: %timeit M.__neg__()
100000 loops, best of 3: 2.81 us per loop

If I change that line to

        M = self.new_matrix()

this becomes

sage: %timeit M.__neg__()
100000 loops, best of 3: 11.8 us per loop

so there are much lower overhead ways of creating matrices (if you already have a hold of the parent).

Of course, in submatrix and stack you don't have the parent in hand, and in transpose you only do if the matrix happens to be square (which may be worth special casing!). However, when you're doing frequent matrix computations, the matrices often end up having similar shapes, so the parents you're looking for are probably already available. This suggests that new_matrix should have access to a cache of parents (weakly referenced or rotating), so that it can do really quick (empty) matrix creation.

comment:12 Changed 5 years ago by SimonKing

  • Cc SimonKing added

comment:13 follow-up: Changed 5 years ago by SimonKing

In the ticket description, you say: "taking a transpose of a modn_dense matrix of smallish size is much more expensive than constructing the right kernel, because the method is just a generic implementation."

In contrast, I believe that there should be a generic implementation of the transpose, or at least there should be some generic helper methods for creating the transposed matrix.

For example, it should make sense to have a method of a matrix space returning the transposed matrix space, and this method should perhaps even be a cached method. When using this helper in all custom implementations of .transpose(), then the time to create the transposed matrix' parent could be reduced.

Moreover, all matrices are supposed to provide fast set/get_unsafe() methods. Are you really sure that a generic implementation relying on these fast "unsafe" methods would be slower than a custom implementation such as the one of Matrix_modn_dense that uses the underlying data structure directly, rather than using set/get_unsafe:

    def transpose(self):
        ...
        for i from 0 <= i < ncols:
            for j from 0 <= j < nrows:
                M._entries[j+i*nrows] = self._entries[i+j*ncols]    

And is this really the correct thing to do? Namely, in the unsafe set/get method of Matrix_modn_dense_float, the operations are done on the attribute ._matrix, not ._entries. If I understand correctly, ._matrix is arranged in rows, whereas ._entries is the same chunk of memory, but not separated in rows. But then, wouldn't it be faster to use ._matrix as long as one stays in the same row (which in the above loop over j is the case for M?

Regardless whether we create a fully fledged generic implementation of .transpose() or just provide new helper methods: We need to address the following custom .transpose() methods:

matrix/matrix_integer_dense.pyx:4770:    def transpose(self):
matrix/matrix_mod2_dense.pyx:1430:    def transpose(self):
matrix/matrix_double_dense.pyx:2242:    def transpose(self):
matrix/matrix_sparse.pyx:365:    def transpose(self):
matrix/matrix_rational_dense.pyx:2356:    def transpose(self):
matrix/matrix_modn_sparse.pyx:684:    def transpose(self):
matrix/matrix_dense.pyx:131:    def transpose(self):
modules/free_module_element.pyx:1119:    def transpose(self):
misc/functional.py:1746:def transpose(x):
misc/table.py:366:    def transpose(self):
combinat/alternating_sign_matrix.py:311:    def transpose(self):
libs/ntl/ntl_mat_GF2E.pyx:538:    def transpose(ntl_mat_GF2E self):
libs/ntl/ntl_mat_GF2.pyx:559:    def transpose(ntl_mat_GF2 self):
matroids/lean_matrix.pyx:332:    cdef LeanMatrix transpose(self):
matroids/lean_matrix.pyx:695:    cdef LeanMatrix transpose(self):
matroids/lean_matrix.pyx:1114:    cdef LeanMatrix transpose(self):
matroids/lean_matrix.pyx:1734:    cdef LeanMatrix transpose(self):
matroids/lean_matrix.pyx:2250:    cdef LeanMatrix transpose(self):
matroids/lean_matrix.pyx:2706:    cdef LeanMatrix transpose(self):

comment:14 in reply to: ↑ 13 Changed 5 years ago by nbruin

Replying to SimonKing:

In the ticket description, you say: "taking a transpose of a modn_dense matrix of smallish size is much more expensive than constructing the right kernel, because the method is just a generic implementation."

In contrast, I believe that there should be a generic implementation of the transpose, or at least there should be some generic helper methods for creating the transposed matrix.

I'm not arguing with that. I'm just saying that we should override it on something like matrices over small finite prime fields because the call overhead of the getunsafe and setunsafe functions is really noticeable relative to the straight C-level reshuffling of data one can do in this specific case.

For example, it should make sense to have a method of a matrix space returning the transposed matrix space, and this method should perhaps even be a cached method. When using this helper in all custom implementations of .transpose(), then the time to create the transposed matrix' parent could be reduced.

That might very well be a good idea. It doesn't quite solve the problem for stack/submatrix operations, though.

Moreover, all matrices are supposed to provide fast set/get_unsafe() methods. Are you really sure that a generic implementation relying on these fast "unsafe" methods would be slower than a custom implementation such as the one of Matrix_modn_dense that uses the underlying data structure directly, rather than using set/get_unsafe

Yes, for modndense it definitely is, because it boils down to reshuffling a C-array of floats/doubles. I'm pretty sure the compiler isn't capable of inlining getunsafe and setunsafe.

                 M._entries[j+i*nrows] = self._entries[i+j*ncols]    

And of course, in this line we could even save some multiplications, but I wasn't sure whether that makes a proper difference on modern CPUs.

And is this really the correct thing to do? Namely, in the unsafe set/get method of Matrix_modn_dense_float, the operations are done on the attribute ._matrix, not ._entries. If I understand correctly, ._matrix is arranged in rows,

._matrix is an array of pointers into ._entries (and yes, in my code I assumed that Matrix_modn_dense_template always has its entries laid out in the same way. That assumption is made elsewhere in the file too). Looking up via ._matrix might save a multiplication at the expense of an extra memory access. My gut feeling was that the memory access was going to be worse, but I didn't extensively check. My guess would be that if the multiplications are costly, it's better to avoid them through repeated additions (we need the intermediate results) than by extra memory indirection. Perhaps replace everything by straight pointer arithmetic on a pointer starting at M._entries?

We need to address the following custom .transpose() methods:

We can reevaluate them with what we learn here, but I'm not sure that the same tradeoffs will apply. For instance, for an integer matrix we cannot expect the entries to be contiguously stored and of the same size, so the memory management just for the entries is already much more expensive.

comment:15 Changed 5 years ago by nbruin

Some timings. I changed dense_template.transpose to use the given parent for square matrices. With this inner copy loop:

        for i from 0 <= i < ncols:
            for j from 0 <= j < nrows:
                M._entries[j+i*nrows] = self._entries[i+j*ncols]    

I get:

sage: k=GF(17)
sage: A=matrix(k,100,101,[k.random_element() for i in range(100*101)])
sage: B=matrix(k,100,100,[k.random_element() for i in range(100*100)])
sage: %timeit At=A.transpose()
10000 loops, best of 3: 30.3 us per loop
sage: %timeit Bt=B.transpose()
100000 loops, best of 3: 11 us per loop

As you can see, parent creation overhead is still the main thing.

With this inner loop

        for i from 0 <= i < ncols:
            for j from 0 <= j < nrows:
                M._matrix[i][j] = self._matrix[j][i]    

I get:

sage: %timeit At=A.transpose()
10000 loops, best of 3: 35.8 us per loop
sage: %timeit Bt=B.transpose()
100000 loops, best of 3: 15.8 us per loop

as one of the better timings. Depending on the matrix creation, but consistent with that fixed, I was also seeing 23.4 us, which I guess happens if the ._matrix pointer array is unfortunately allocated in memory relative to _entries (cache thrashing perhaps).

I've also tried:

        Midx=0
        for i from 0 <= i < ncols:
            selfidx=i
            for j from 0 <= j < nrows:
                M._entries[Midx]=self._entries[selfidx]
                Midx+=1
                selfidx+=ncols

which was not really distinguishable from the first solution, but if anything, slightly slower. So my guess is that a multiplication is not something to worry about on modern CPUs.

comment:16 Changed 5 years ago by SimonKing

If I understand correctly, you say that we should keep all the custom transpose methods, since the matrix classes should know best how to create one of their instances efficiently.

Since you say that the creation of the parent has the most impact, I suggest to add a lazy attribute transposed to matrix spaces, and this can be used to create a blank matrix of the correct dimension that can then be filled by whatever custom method we have.

Variation of this theme: We could instead say that self.new_matrix() should not only check whether the to-be-created matrix lives in the same parent as self, but make a second special case for the transposed dimensions.

comment:17 Changed 5 years ago by SimonKing

  • Branch changed from u/nbruin/ticket/15104 to u/SimonKing/ticket/15104
  • Modified changed from 02/06/14 08:13:13 to 02/06/14 08:13:13

comment:18 follow-up: Changed 5 years ago by SimonKing

  • Commit changed from a908e28159a544ca33f03dcbf0e8def3cfe9a60e to 0f008f9266e3ed6fbd67e7f3d357474825bdb160

With your branch, I got

sage: k=GF(17)
sage: A=matrix(k,100,101,[k.random_element() for i in range(100*101)])
sage: B=matrix(k,100,100,[k.random_element() for i in range(100*100)])
sage: %timeit At=A.transpose()
10000 loops, best of 3: 68.5 us per loop
sage: %timeit Bt=B.transpose()
10000 loops, best of 3: 48.7 us per loop

With the additional commits that I have just pushed, I get

sage: k=GF(17)
sage: A=matrix(k,100,101,[k.random_element() for i in range(100*101)])
sage: B=matrix(k,100,100,[k.random_element() for i in range(100*100)])
sage: %timeit At=A.transpose()
10000 loops, best of 3: 49.9 us per loop
sage: %timeit Bt=B.transpose()
10000 loops, best of 3: 49.3 us per loop

So, looks like progress.

My changes are: I added a lazy attribute to matrix spaces, returning the transposed matrix space, and I am using it in .new_matrix(). This has the advantage that it is a generic method used by different custom implementations of .transpose(). Hence, there should be a speed-up for all types of matrices, not only for Matrix_modn_dense_float.


New commits:

5487c67Trac 15104: Faster creation of transposed matrix' parent
0f008f9Trac 15104: Add a test for the new lazy attribute

comment:19 Changed 5 years ago by SimonKing

  • Authors changed from Nils Bruin to Nils Bruin, Simon King
  • Work issues set to regression in right_kernel_matrix

Concerning the issue raised by Travis in comment:5:

With the previous branch, I get

sage: M=matrix(GF(5),6,6,range(36))
sage: %timeit M.right_kernel_matrix(basis='computed')
1000 loops, best of 3: 370 us per loop
sage: %timeit M.right_kernel_matrix(basis='pivot')
1000 loops, best of 3: 378 us per loop
sage: %timeit M.transpose()
100000 loops, best of 3: 9.87 us per loop
sage: %timeit M.stack(M)
10000 loops, best of 3: 39.1 us per loop
sage: %timeit M.submatrix(1,1)
10000 loops, best of 3: 30.9 us per loop

With the new commits, I get

sage: M=matrix(GF(5),6,6,range(36))
sage: %timeit M.right_kernel_matrix(basis='computed')
1000 loops, best of 3: 375 us per loop
sage: %timeit M.right_kernel_matrix(basis='pivot')
1000 loops, best of 3: 382 us per loop
sage: %timeit M.transpose()
100000 loops, best of 3: 9.93 us per loop
sage: %timeit M.stack(M)
10000 loops, best of 3: 39.2 us per loop
sage: %timeit M.submatrix(1,1)
10000 loops, best of 3: 30.7 us per loop

With the current develop branch, I get

sage: M=matrix(GF(5),6,6,range(36))
sage: %timeit M.right_kernel_matrix(basis='computed')
1000 loops, best of 3: 231 us per loop
sage: %timeit M.right_kernel_matrix(basis='pivot')
1000 loops, best of 3: 232 us per loop
sage: %timeit M.transpose()
100000 loops, best of 3: 13.1 us per loop
sage: %timeit M.stack(M)
10000 loops, best of 3: 52.8 us per loop
sage: %timeit M.submatrix(1,1)
10000 loops, best of 3: 37.9 us per loop

So, there still remains something to do for right kernel matrix.

comment:20 Changed 5 years ago by SimonKing

Running M.right_kernel_matrix(basis='computed') 1000 times, prun yields

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.130    0.000    0.131    0.000 dynamic_class.py:324(dynamic_class_internal)
     1000    0.068    0.000    0.414    0.000 matrix_space.py:221(__init__)
     1000    0.038    0.000    0.580    0.001 {method 'right_kernel_matrix' of 'sage.matrix.matrix_modn_dense_float.Matrix_modn_dense_template' objects}
     1000    0.035    0.000    0.070    0.000 matrix_space.py:929(_get_matrix_class)
     1000    0.031    0.000    0.066    0.000 arith.py:407(is_prime)
     1000    0.028    0.000    0.470    0.000 matrix_space.py:145(__classcall__)
     1000    0.027    0.000    0.259    0.000 category.py:435(__classcall__)
    11000    0.023    0.000    0.023    0.000 {isinstance}
2000/1000    0.017    0.000    0.428    0.000 unique_representation.py:1006(__classcall__)
     1000    0.016    0.000    0.019    0.000 {method 'is_prime' of 'sage.rings.integer.Integer' objects}
     1000    0.015    0.000    0.192    0.000 category_types.py:324(__init__)
     8000    0.014    0.000    0.014    0.000 finite_field_prime_modn.py:272(order)
     1000    0.014    0.000    0.172    0.000 category.py:466(__init__)
        1    0.013    0.013    0.592    0.592 <string>:1(<module>)
2000/1000    0.010    0.000    0.419    0.000 {sage.misc.classcall_metaclass.typecall}
     1000    0.008    0.000    0.011    0.000 all.py:1(arithmetic)
     1000    0.008    0.000    0.276    0.000 modules.py:70(__classcall_private__)
     2000    0.008    0.000    0.012    0.000 weakref.py:223(__new__)
     2000    0.007    0.000    0.007    0.000 weakref.py:228(__init__)
     1000    0.007    0.000    0.015    0.000 category.py:1964(_make_named_class)
     1000    0.007    0.000    0.138    0.000 dynamic_class.py:122(dynamic_class)
     1000    0.006    0.000    0.265    0.000 vector_spaces.py:34(__classcall_private__)
     1000    0.006    0.000    0.010    0.000 finite_field_prime_modn.py:99(__cmp__)
     1000    0.005    0.000    0.177    0.000 category_types.py:202(__init__)
     1000    0.005    0.000    0.020    0.000 category.py:1238(subcategory_class)
     1000    0.005    0.000    0.197    0.000 vector_spaces.py:64(__init__)
     2000    0.004    0.000    0.004    0.000 {built-in method __new__ of type object at 0xb775ffc0}
     1000    0.004    0.000    0.006    0.000 category_types.py:206(_make_named_class_key)
     1000    0.004    0.000    0.006    0.000 rational_field.py:994(is_RationalField)
     1000    0.003    0.000    0.005    0.000 integer_mod_ring.py:148(is_IntegerModRing)
     1000    0.003    0.000    0.005    0.000 number_field.py:920(is_CyclotomicField)
     1000    0.003    0.000    0.003    0.000 fields.py:60(__contains__)
     1000    0.002    0.000    0.002    0.000 proof.py:151(get_flag)
     1000    0.002    0.000    0.002    0.000 proof.py:18(arithmetic)
     1000    0.002    0.000    0.002    0.000 {cmp}
     1000    0.002    0.000    0.002    0.000 matrix_space.py:1095(is_dense)
     1000    0.002    0.000    0.002    0.000 matrix_space.py:1433(nrows)
     1000    0.002    0.000    0.002    0.000 matrix_space.py:1421(ncols)
     1000    0.002    0.000    0.002    0.000 {sage.rings.integer_ring.is_IntegerRing}
     1000    0.002    0.000    0.002    0.000 {method 'category' of 'sage.rings.ring.Ring' objects}
     1000    0.002    0.000    0.002    0.000 finite_field_prime_modn.py:202(characteristic)
     1000    0.002    0.000    0.002    0.000 {method 'has_key' of 'dict' objects}
     1000    0.002    0.000    0.002    0.000 {method 'base_ring' of 'sage.structure.category_object.CategoryObject' objects}
        1    0.000    0.000    0.000    0.000 {range}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

with the current branch. So, a matrix space is created 1000 times---I suppose that this only happens because of weak caching.

comment:21 Changed 5 years ago by SimonKing

Here is the same in the master branch:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     1000    0.078    0.000    0.124    0.000 matrix_space.py:1187(matrix)
     1000    0.068    0.000    0.168    0.000 matrix_space.py:221(__init__)
     1000    0.058    0.000    0.453    0.000 {method 'right_kernel_matrix' of 'sage.matrix.matrix2.Matrix' objects}
     1000    0.034    0.000    0.069    0.000 matrix_space.py:911(_get_matrix_class)
    15000    0.032    0.000    0.032    0.000 {isinstance}
     1000    0.029    0.000    0.227    0.000 matrix_space.py:145(__classcall__)
     2000    0.017    0.000    0.031    0.000 misc.py:161(cputime)
     9000    0.013    0.000    0.013    0.000 {len}
        1    0.011    0.011    0.464    0.464 <string>:1(<module>)
     6000    0.010    0.000    0.010    0.000 finite_field_prime_modn.py:272(order)
     1000    0.010    0.000    0.014    0.000 category.py:435(__classcall__)
     2000    0.009    0.000    0.009    0.000 {resource.getrusage}
     1000    0.008    0.000    0.031    0.000 modules.py:70(__classcall_private__)
     2000    0.008    0.000    0.039    0.000 misc.py:392(verbose)
     1000    0.008    0.000    0.183    0.000 unique_representation.py:1006(__classcall__)
     4000    0.007    0.000    0.007    0.000 {method 'extend' of 'list' objects}
     1000    0.006    0.000    0.010    0.000 finite_field_prime_modn.py:99(__cmp__)
     1000    0.006    0.000    0.020    0.000 vector_spaces.py:34(__classcall_private__)
     1000    0.006    0.000    0.174    0.000 {sage.misc.classcall_metaclass.typecall}
     1000    0.005    0.000    0.129    0.000 matrix_space.py:387(__call__)
     1000    0.004    0.000    0.006    0.000 weakref.py:223(__new__)
     1000    0.004    0.000    0.004    0.000 weakref.py:228(__init__)
     1000    0.004    0.000    0.006    0.000 group_element.py:77(is_MatrixGroupElement)
     1000    0.003    0.000    0.006    0.000 rational_field.py:994(is_RationalField)
     1000    0.003    0.000    0.005    0.000 integer_mod_ring.py:148(is_IntegerModRing)
     2000    0.003    0.000    0.003    0.000 finite_field_prime_modn.py:202(characteristic)
     1000    0.003    0.000    0.005    0.000 number_field.py:920(is_CyclotomicField)
     1000    0.003    0.000    0.003    0.000 fields.py:60(__contains__)
     1000    0.002    0.000    0.002    0.000 {cmp}
     1000    0.002    0.000    0.002    0.000 {built-in method __new__ of type object at 0xb774dfc0}
     1000    0.002    0.000    0.002    0.000 matrix_space.py:1077(is_dense)
     1000    0.002    0.000    0.002    0.000 matrix_space.py:1415(nrows)
     1000    0.002    0.000    0.002    0.000 matrix_space.py:1403(ncols)
     1000    0.002    0.000    0.002    0.000 {sage.rings.integer_ring.is_IntegerRing}
     1000    0.002    0.000    0.002    0.000 {sage.matrix.matrix.is_Matrix}
     1000    0.002    0.000    0.002    0.000 {method 'base_ring' of 'sage.structure.category_object.CategoryObject' objects}
        1    0.000    0.000    0.000    0.000 {range}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

So, we see the following regression in cumulative time:

vector_spaces.py:34(__classcall_private__) 0.020 --> 0.265

In fact, we see 1000 0.005 0.000 0.197 0.000 vector_spaces.py:64(__init__) with the current ticket branch, but it does not occur at all with the matrix branch.

Hence, I guess to fix the regression we need to address this question: Why is the category of vector spaces (probably the one over GF(5)) not taken from cache?

Last edited 5 years ago by SimonKing (previous) (diff)

comment:22 Changed 5 years ago by SimonKing

By the way, note a lot of trailing whitespace in matrix_modn_dense_template.pxi.

comment:23 Changed 5 years ago by SimonKing

For testing, I added a print statement in the __init__ method of vector spaces, and with

sage: M=matrix(GF(5),6,6,range(36))
sage: for i in range(1000):
....:     N = M.right_kernel_matrix(basis='computed')
....:     del N
....:     

I find that indeed the category of vector spaces over GF(5) is created in each round. That's very surprising! After all, the category of vector spaces is supposed to be one immediate super category of the category that the matrix space belongs to (algebras over GF(5)).

So, what has changed that could explain why the category of vector spaces over a given field vanishes from the cache with the ticket branch, but doesn't vanish with the master branch?

comment:24 Changed 5 years ago by SimonKing

Now I am totally puzzled. I can not reproduce the above finding concerning cache.

comment:25 Changed 5 years ago by SimonKing

Aha! I can reproduce it when I do

sage: M=matrix(GF(5),6,6,range(36))
sage: for i in range(1000):
....:     N = M.right_kernel_matrix(basis='computed')
....:     del N
....:     

But when doing M = random_matrix(GF(5),10,7) then the cache problem gets fixed---permanently. Hence, redefining M again by M=matrix(GF(5),6,6,range(36)) does not make the problem re-appear.

It seems that the matrix function somehow manages to work around the cache for vector space categories. Odd.

comment:26 Changed 5 years ago by SimonKing

The following should show the effect of properly caching the vector space category:

sage: M=matrix(GF(5),6,6,range(36))
sage: %timeit M.right_kernel_matrix(basis='computed')  # category is repeatedly computed
1000 loops, best of 3: 377 us per loop
sage: N=random_matrix(GF(5),10,7)     # this somehow triggers a proper caching
sage: %timeit M.right_kernel_matrix(basis='computed')  # much faster with proper caching
10000 loops, best of 3: 190 us per loop

comment:27 Changed 5 years ago by SimonKing

The difference seems to be the category initialisation of the matrix space:

sage: M=matrix(GF(5),6,6,range(36))
sage: MS = M.parent()
sage: %timeit M.right_kernel_matrix(basis='computed')
1000 loops, best of 3: 377 us per loop
sage: MS.full_category_initialisation()
sage: %timeit M.right_kernel_matrix(basis='computed')
10000 loops, best of 3: 190 us per loop

There is a reason why the category of a matrix space is not fully initialised by default. So, making category initialisation the new default would be bad, IMHO.

First, we should try to understand why apparently the right_kernel_matrix operation in the master branch does result in caching the category, while the ticket branch doesn't.

Then we should identify some operations that should trigger a full category initialisation.

In principle, creating a strong cache for vector space categories would fix the regression; in a new sage session:

sage: M=matrix(GF(5),6,6,range(36))
sage: C = VectorSpaces(GF(5))
sage: %timeit M.right_kernel_matrix(basis='computed')
10000 loops, best of 3: 186 us per loop

But this would be prone to create memory leaks that we have previously been fixing.

Last edited 5 years ago by SimonKing (previous) (diff)

comment:28 Changed 5 years ago by SimonKing

Doing M=matrix(GF(5),6,6,range(36)) does not trigger the creation of the category of vector spaces over GF(5). I wonder why this is the case: After all, M knows its parent, and as I have just checked the category of vector spaces should always be used when creating a matrix space.

So, does the matrix function do anything funny when creating a matrix? Such as: Not creating the matrix space which the matrix belongs to?

comment:29 Changed 5 years ago by SimonKing

No, I was mistaken. The matrix space of 6x6 matrices over GF(5) is created when doing M=matrix(GF(5),6,6,range(36)). However, the category used for partial initialisation of the category is Algebras(GF(5)), not VectorSpaces(GF(5)).

In contrast to a full category initialisation, the partial initialisation does not involve looking up the super categories and parent class of Algebras(GF(5)). Hence, VectorSpaces(GF(5) is not created. And later, when it is created, then it is not permanently cached, for a reason that I don't understand yet.

So, where are we? We understand why creating a non-square matrix fixes the observed regression for square matrices over the same base field. However, we do not understand why the category of vector spaces over this base field fails to be strongly referenced.

comment:30 Changed 5 years ago by SimonKing

  • Cc nthiery added

It seems to me that the following happens when creating a square matrix and then computing right kernel:

  • The parent of the matrix is incompletely (by #11900) initialised in the category of algebras. This will not trigger the creation of super categories, as the incomplete category initialisation skips the category's parent class.
  • When computing the kernel, some temporary results are created that are non-square matrices. Their parents are initialised in the category of vector spaces.
  • The temporary results, their parents and thus also the category of vector spaces will be garbage collected as soon as the kernel computation is done. Hence, when we compute another kernel, then the category of vector spaces is created again. And that's the regression.

I just tested: If one adds the following initialisation method to the category of algebras, then the regression vanishes:

    def __init__(self, B):
        Category_over_base_ring.__init__(self, B)
        self._super_categories

It stores the category of B-modules as an attribute of the category of B-algebras. Hence, if we create a square matrix M over GF(5), then there will be a chain of strong references

M->M.parent()->Algebras(GF(5))->VectorSpaces(GF(5))

And thus the undue garbage collection of VectorSpaces(GF(5)), that seems to be the cause of the regression, will not take place.

However, I can imagine that some code (number fields and elliptic curves are the usual suspects) will regress when the creation of the category of algebras always creates the super categories. After all, an observed regression was the reason for introducing the incomplete initialisation of matrix spaces in #11900.

What do you think we should do?

comment:31 follow-up: Changed 5 years ago by nthiery

Hmm interesting; fixing a speed regression by going too much in one direction is triggering another speed regression!

Would it make sense as a short term fix to implement now the alternative option mentioned in the documentation of MatrixSpace?.full_category_initialisation? After all, having an object whose category is not completely initialized is prone to issues, and it's not unreasonable to have to ask explicitly for it when needed, rather than the converse.

        .. todo::

            Add instead an optional argument to :func:`MatrixSpace` to
            temporarily disable the category initialization in those
            special cases where speed is critical::

                sage: MS = MatrixSpace(QQ,7, init_category=False) # todo: not implemented
                sage: TestSuite(MS).run()                         # todo: not implemented
                Traceback (most recent call last):
                ...
                AssertionError: category of self improperly initialized

            until someone recreates explicitly the same matrix space
            without that optional argument::

                sage: MS = MatrixSpace(QQ,7)                      # todo: not implemented
                sage: TestSuite(MS).run()                         # todo: not implemented

Then, for the long run, would the proper fix be to aim at using Algebra(Fields()) (which is yet to be implemented) instead of Algebras(K) as category?

Cheers,

Nicolas

comment:32 Changed 5 years ago by SimonKing

Good news first

The proposed __init__ method for the category of algebras seems not to result in a severe regression in the benchmark examples from #11900. Or perhaps there is a regression in prove_BSD, but this would be due to the current commits, not due to the new init method.

First, with the added __init__ method for the category of algebras:

sage: %time D = J0(46).endomorphism_ring().discriminant()
CPU times: user 10.23 s, sys: 0.17 s, total: 10.40 s
Wall time: 10.42 s
sage: %time TestSuite(CrystalOfTableaux(['B',4],shape=[2,1,1,0])).run()
CPU times: user 3.24 s, sys: 0.02 s, total: 3.26 s
Wall time: 3.27 s
sage: W.<z> = CyclotomicField(13)
sage: %time M = Matrix(W, 2, 3, [10^30*(1-z)^13, 1, 2, 3, 4, z]).echelon_form()
CPU times: user 2.01 s, sys: 0.02 s, total: 2.03 s
Wall time: 2.05 s
sage: %time L = EllipticCurve('960d1').prove_BSD()
CPU times: user 4.62 s, sys: 0.07 s, total: 4.69 s
Wall time: 4.71 s
sage: def test(E):
....:     for p in prime_range(10000):
....:         if p != 389:
....:             G = E.change_ring(GF(p)).abelian_group()
....:             
sage: E = EllipticCurve('389a')
sage: %time test(E)
CPU times: user 26.80 s, sys: 0.09 s, total: 26.89 s
Wall time: 26.94 s
sage: E = J0(46).endomorphism_ring()
sage: %time g = E.gens()
CPU times: user 9.39 s, sys: 0.15 s, total: 9.54 s
Wall time: 9.56 s

Next, with the current commits:

sage: %time D = J0(46).endomorphism_ring().discriminant()
CPU times: user 9.98 s, sys: 0.19 s, total: 10.17 s
Wall time: 10.21 s
sage: %time TestSuite(CrystalOfTableaux(['B',4],shape=[2,1,1,0])).run()
CPU times: user 3.30 s, sys: 0.06 s, total: 3.36 s
Wall time: 3.23 s
sage: W.<z> = CyclotomicField(13)
sage: %time M = Matrix(W, 2, 3, [10^30*(1-z)^13, 1, 2, 3, 4, z]).echelon_form()
CPU times: user 2.02 s, sys: 0.01 s, total: 2.03 s
Wall time: 2.04 s
sage: %time L = EllipticCurve('960d1').prove_BSD()
CPU times: user 4.70 s, sys: 0.06 s, total: 4.77 s
Wall time: 4.79 s
sage: def test(E):
....:     for p in prime_range(10000):
....:         if p != 389:
....:             G = E.change_ring(GF(p)).abelian_group()
....:             
sage: E = EllipticCurve('389a')
sage: %time test(E)
CPU times: user 27.44 s, sys: 0.11 s, total: 27.55 s
Wall time: 27.60 s
sage: E = J0(46).endomorphism_ring()
sage: %time g = E.gens()
CPU times: user 9.17 s, sys: 0.19 s, total: 9.37 s
Wall time: 9.39 s

And finally, as baseline, the master branch:

sage: %time D = J0(46).endomorphism_ring().discriminant()
CPU times: user 10.07 s, sys: 0.20 s, total: 10.27 s
Wall time: 10.29 s
sage: %time TestSuite(CrystalOfTableaux(['B',4],shape=[2,1,1,0])).run()
CPU times: user 3.18 s, sys: 0.00 s, total: 3.18 s
Wall time: 3.19 s
sage: W.<z> = CyclotomicField(13)
sage: %time M = Matrix(W, 2, 3, [10^30*(1-z)^13, 1, 2, 3, 4, z]).echelon_form()
CPU times: user 1.96 s, sys: 0.02 s, total: 1.98 s
Wall time: 1.99 s
sage: %time L = EllipticCurve('960d1').prove_BSD()
CPU times: user 4.68 s, sys: 0.08 s, total: 4.76 s
Wall time: 4.78 s
sage: def test(E):
....:     for p in prime_range(10000):
....:         if p != 389:
....:             G = E.change_ring(GF(p)).abelian_group()
....:             
sage: E = EllipticCurve('389a')
sage: %time test(E)
CPU times: user 27.20 s, sys: 0.08 s, total: 27.28 s
Wall time: 27.33 s
sage: E = J0(46).endomorphism_ring()
sage: %time g = E.gens()
CPU times: user 9.25 s, sys: 0.16 s, total: 9.41 s
Wall time: 9.43 s

Hence, in these tests, that have previously been the reason to have an incomplete category initialisation of matrix spaces, there would be no regression when letting the category of algebras create and store their super categories upon initialisation.

What is your opinion on that idea?

Worries

Did some regression creep in without being noticed? The first example,

%time D = J0(46).endomorphism_ring().discriminant()

used to be faster, pre-#9138 and post-#11900.

I'd say this particular example should go on a separate ticket.

comment:33 Changed 5 years ago by SimonKing

I created #15792 for the existing regression.

comment:34 Changed 5 years ago by SimonKing

Indeed it seems we have had a bad regression since #11900. The good news in the bad news is: With a full category initialisation of matrix spaces, we wouldn't make things worse...

comment:35 in reply to: ↑ 18 Changed 5 years ago by nbruin

Replying to SimonKing:

My changes are: I added a lazy attribute to matrix spaces, returning the transposed matrix space, and I am using it in .new_matrix(). This has the advantage that it is a generic method used by different custom implementations of .transpose(). Hence, there should be a speed-up for all types of matrices, not only for Matrix_modn_dense_float.

True, but new_matrix does too much. Instantiating a new matrix via self.__class__.__new__(...) has the big advantage that it avoids the __init__, which has expensive argument processing and has to do a lot of work to see how the entries need to be initialized. But a lot of these inner routines don't care about initialization, because they'll fill the matrix themselves any way (with a memcpy possibly!). Your timings show that getting a quicker hold of the parent improves timings (and hence may be worth doing generically), but to get the factor >2 speed-up I was seeing, we probably need to avoid calling __init__. That's of course tricky on a generic level, but completely doable in specific classes (as already happens in modn_dense_template). Possibly improve infrastructure to write such optimizations?


New commits:

5487c67Trac 15104: Faster creation of transposed matrix' parent
0f008f9Trac 15104: Add a test for the new lazy attribute

comment:36 in reply to: ↑ 31 ; follow-up: Changed 5 years ago by nbruin

Replying to nthiery:

Would it make sense as a short term fix to implement now the alternative option mentioned in the documentation of MatrixSpace?.full_category_initialisation? After all, having an object whose category is not completely initialized is prone to issues, and it's not unreasonable to have to ask explicitly for it when needed, rather than the converse.

The problem there is that the person not requiring a fully initialized matrix space isn't asking for a matrix space at all: he/she is probably just asking for a matrix and is doing that, quite reasonably, via linear algebra routines. Those routines *internally* find the need to find/construct an appropriate matrix space and these routines need to be able to operate both in time critical code where full initialization of a never-used object is too expensive and in situations where the result needs to be a full member of the sage universe. I suspect it'll be too difficult to get the intent of the caller to the location where that intent is relevant.

In cases where the same size matrices end up being created repeatedly, ensuring that matrix spaces have a circular reference (most parents do) should let them stick around long enough (till next GC) for weak caching to be effective.

In number theoretic code, where one for instance needs to compute a lot of characteristic polynomials of same-sized matrices over a lot of distinct finite fields, the matrix spaces don't get used/reused at all, so any time spent on them is wasted.

comment:37 in reply to: ↑ 36 ; follow-up: Changed 5 years ago by SimonKing

Replying to nbruin:

The problem there is that the person not requiring a fully initialized matrix space isn't asking for a matrix space at all: he/she is probably just asking for a matrix and is doing that, quite reasonably, via linear algebra routines. Those routines *internally* find the need to find/construct an appropriate matrix space and these routines need to be able to operate both in time critical code where full initialization of a never-used object is too expensive and in situations where the result needs to be a full member of the sage universe. I suspect it'll be too difficult to get the intent of the caller to the location where that intent is relevant.

That used to be the rationale in #11900.

However, it seems that with the current version of Sage, the examples used in #11900 would not show a new regression when one would now always fully initialise the matrix spaces!

So, the underlying rationale for this part of #11900 is gone. It would make sense to consider to drop the dirty trick of incomplete initialisation.

That said: It seems that some of the examples are now showing a regression even without full category initialisation of matrix spaces. That's what #15792 is about, and it currently is a mystery where the regression came from---apparently it isn't from matrix spaces.

comment:38 in reply to: ↑ 37 ; follow-up: Changed 5 years ago by nbruin

Replying to SimonKing:

So, the underlying rationale for this part of #11900 is gone. It would make sense to consider to drop the dirty trick of incomplete initialisation.

Hm, that might just be a matter of an old pothole in the road not being an immediate problem anymore because a tree has now fallen and is hampering traffic much more severely, if you can stomach a laboured analogy.

comment:39 in reply to: ↑ 38 Changed 5 years ago by SimonKing

Replying to nbruin:

Hm, that might just be a matter of an old pothole in the road not being an immediate problem anymore because a tree has now fallen and is hampering traffic much more severely

Quite possible. But so far I can't see the tree.

if you can stomach a laboured analogy.

I am having breakfast now, so, yes.

comment:40 Changed 5 years ago by SimonKing

At #15792, I have collected some tests that apparently became much slower in the past 2 years. I have no idea yet why they became slower.

Here, I am showcasing the effect of full category initialisation on the time needed to create a matrix space.

First, the current master branch, i.e., no full initialisation. This seems to be slower than 2 years ago, but not much.

sage: def test():
....:     for i in xrange(10000):
....:         MS = MatrixSpace(QQ,i)
....:         
sage: %time test()
CPU times: user 0.93 s, sys: 0.01 s, total: 0.93 s
Wall time: 0.93 s
sage: type(MatrixSpace(QQ,8))
<class 'sage.matrix.matrix_space.MatrixSpace'>

Now, I am repeating it with full category initialisation:

sage: def test():
....:     for i in xrange(10000):
....:         MS = MatrixSpace(QQ,i)
....:         
sage: %time test()
CPU times: user 1.74 s, sys: 0.00 s, total: 1.74 s
Wall time: 1.74 s
sage: type(MatrixSpace(QQ,8))
<class 'sage.matrix.matrix_space.MatrixSpace_with_category'>
sage: (1.74-0.93)/0.93
0.870967741935484

So, the regression "on its origin" would be 87%.

In other words, Nils, you are right: A tree did fall onto a pothole in the street.

comment:41 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:42 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:43 Changed 2 years ago by nbruin

I tried to resolve the merge conflicts and it was a bit of a mess (I got way more conflicts than justified by the commits listed). Please leave a message here if you have a good idea to resolve this and/or if you start working on it. Otherwise I'll try in the near future to just build a new branch to make the changes implemented here (and I'd leave a marker here once I start).

comment:44 Changed 2 years ago by nbruin

  • Branch changed from u/SimonKing/ticket/15104 to u/nbruin/ticket/15104

comment:45 Changed 2 years ago by nbruin

  • Commit changed from 0f008f9266e3ed6fbd67e7f3d357474825bdb160 to ce2cfdb336f6e6ced9372e157724e6df9fd624cc
  • Keywords sd86.5 added
  • Status changed from needs_work to needs_review
  • Work issues regression in right_kernel_matrix deleted

rebased; ready for review.


New commits:

ce2cfdbtrac15104: some special cased modn_dense matrix operations for improved performance

comment:46 Changed 2 years ago by tscrim

  • Branch changed from u/nbruin/ticket/15104 to public/linear_algebra/modn_dense_speedup-15104
  • Commit changed from ce2cfdb336f6e6ced9372e157724e6df9fd624cc to 10fa935c280469669333a64b63686d5bcf197538
  • Milestone changed from sage-6.4 to sage-8.0
  • Reviewers set to Travis Scrimshaw

I made some doc improvements and fixed a typo that caused this to not compile.

Using the same test order as comment:5:

10000 loops, best of 3: 93.2 µs per loop
10000 loops, best of 3: 95.5 µs per loop
100000 loops, best of 3: 3.95 µs per loop
10000 loops, best of 3: 89.3 µs per loop
100000 loops, best of 3: 8.08 µs per loop

vs develop:

10000 loops, best of 3: 132 µs per loop
10000 loops, best of 3: 133 µs per loop
100000 loops, best of 3: 2.73 µs per loop
100000 loops, best of 3: 6.69 µs per loop
100000 loops, best of 3: 8.38 µs per loop

So the right kernel matrices are faster, but not the transpose, stack (this one is really bad), and submatrix.

For the transpose, you are not using self._parent.transposed lazy attribute.

For stack, the meat of the code should be in cdef _stack_impl(self, bottom): (see matrix1.pyx).

Instead of overwriting submatrix, it would be better to define matrix_from_rows_and_columns.

However, I don't have a reason why these direct implementations are slower than their generic counterparts.

Addendum: Actually, a good part of the stack slowdown might be because you are doing the bulk of the work in a def function rather than a cdef.


New commits:

10fa935Cleanup doc and fix typo.
Last edited 2 years ago by tscrim (previous) (diff)

comment:47 Changed 2 years ago by git

  • Commit changed from 10fa935c280469669333a64b63686d5bcf197538 to 27f9467b3530468668522e310ce239870d0d023a

Branch pushed to git repo; I updated commit sha1. New commits:

27f9467faster construction of new parents

comment:48 Changed 2 years ago by nbruin

Hm, somehow the changes for better construction of parents didn't make it in. Fixed now. At least nothing is worse now. However, without improvement it's hard to argue why to make the changes. Perhaps there are other cases where differences are bigger.

comment:49 Changed 2 years ago by git

  • Commit changed from 27f9467b3530468668522e310ce239870d0d023a to 81c7ef473e9c178c8734ab77e89eeebdbd72a38a

Branch pushed to git repo; I updated commit sha1. New commits:

81c7ef4Use _stack_impl for the work and leave the checks for stack.

comment:50 Changed 2 years ago by git

  • Commit changed from 81c7ef473e9c178c8734ab77e89eeebdbd72a38a to 5c878c728cf05071490446e5e56b3d32d109cb5b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

5c878c7Use _stack_impl for the work and leave the checks for stack.

comment:51 Changed 2 years ago by tscrim

By using _stack_impl, I could get some extra speed out of stack:

100000 loops, best of 3: 9.13 µs per loop
100000 loops, best of 3: 10.5 µs per loop
100000 loops, best of 3: 2.14 µs per loop
100000 loops, best of 3: 5.9 µs per loop
100000 loops, best of 3: 8.34 µs per loop

The most amazing thing is the speedup of the right kernel matrix. There was a lot of overhead there: about a 10x speedup.

If my last changes are good, then positive review.

comment:52 Changed 2 years ago by nbruin

  • Status changed from needs_review to positive_review

Thanks! This looks fine. It seems this is a uniform improvement, and further optimizations would likely need to consider some new ideas and areas, so this is definitely worth merging.

comment:53 Changed 2 years ago by vbraun

  • Branch changed from public/linear_algebra/modn_dense_speedup-15104 to 5c878c728cf05071490446e5e56b3d32d109cb5b
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:54 Changed 2 years ago by jdemeyer

  • Commit 5c878c728cf05071490446e5e56b3d32d109cb5b deleted

For future reference, code like

cdef Py_ssize_t* nonpivots = <Py_ssize_t*>sig_malloc(sizeof(Py_ssize_t)*(ncols-r))

is unsafe for 2 reasons:

  1. sig_malloc might return NULL in the case that no memory is available.
  1. The multiplication sizeof(Py_ssize_t) * (ncols-r) can overflow (this is unlikely to happen in practice for matrices, but it's still a bug)

Luckily, the function check_allocarray from cysignals solves both these problems. The allocation could be written as

cdef Py_ssize_t* nonpivots = <Py_ssize_t*>check_allocarray(ncols - r, sizeof(Py_ssize_t))
Note: See TracTickets for help on using tickets.