Opened 6 years ago
Closed 5 years ago
#21303 closed defect (fixed)
Make hecke operators not blow up the memory
Reported by:  kartikv  Owned by:  

Priority:  major  Milestone:  sage7.4 
Component:  modular forms  Keywords:  eclib, modular symbols, hecke operators 
Cc:  cremona, kedlaya  Merged in:  
Authors:  Kiran Kedlaya  Reviewers:  David Roe 
Report Upstream:  None of the above  read trac for reasoning.  Work issues:  
Branch:  6c3bac5 (Commits, GitHub, GitLab)  Commit:  6c3bac591bde14d3bd23bb4f983af24a7b8077b6 
Dependencies:  Stopgaps: 
Description (last modified by )
Constructing Hecke operators acting on modular symbols can blow up the memory usage, though it doesn't have to. The eclib code has sparse and dense options for generating Hecke operators, but the sage caller only calls the dense options. The simple solution is to modify the sage caller to add a flag which specifies to use the sparse options: perhaps a better solution would be to intuit whether the resulting matrix is likely to be sparse and do something intelligent.
Change History (34)
comment:1 Changed 6 years ago by
 Description modified (diff)
comment:2 Changed 6 years ago by
comment:3 Changed 6 years ago by
You will also need to write a cython function to convert an eclib sparse matrix to a Sage one. William and I did this for dense matrices in about 2008. The eclib sparese matrix class is written at rather a low level for efficiency with arrays of pointers etc. Ask me if it is not clear how to extract the information from one.
The output function at lne 729 of https://github.com/JohnCremona/eclib/blob/master/libsrc/smat.cc might help; also the associated header file. I suggest that you loop through all the rows and for each one fetch all the (column, value) pairs. One thing to watch out for: if a row contains n nonzero entries then the 0'th entry in the associated col array stores n so that array has size n+1 while the values array has size n. Row and column numbers start from 1 not 0. Good luck.
comment:4 Changed 6 years ago by
 Branch set to u/kedlaya/make_hecke_operators_not_blow_up_the_memory
 Commit set to bcd410a685a80decf97914b056c1058747ee785a
 Status changed from new to needs_review
I got a rather crude version of this up and running, so here's a patch. The crude part is that I ask eclib
for dense vectors one at a time, rather than asking for a whole sparse matrix; but this was easier to wrap (I didn't need to tamper with the include files to expose any functionality) and already avoids an obvious quadratic memory step.
I tried this out on my usual compute server:
sage: time C = CremonaModularSymbols(400001, sign=1) CPU times: user 26.6 s, sys: 119 ms, total: 26.7 s Wall time: 26.7 s sage: time T2a = C.hecke_matrix(2) CPU times: user 17.7 s, sys: 4.9 s, total: 22.6 s Wall time: 22.6 s sage: time T2b = T2a.sage_matrix_over_ZZ(sparse=True) CPU times: user 26.2 s, sys: 83 ms, total: 26.2 s Wall time: 26.3 s sage: del T2a sage: time T2c = C.sparse_hecke_matrix(2) CPU times: user 1min 10s, sys: 548 ms, total: 1min 11s Wall time: 1min 11s sage: time T2b == T2c # True but not instant for sparse matrices CPU times: user 11 s, sys: 650 ms, total: 11.7 s Wall time: 11.7 s True
The regression (48.9s to 71.7s) is annoying; the bottleneck is the line
ans = M(entries=d)
where M
is the matrix space of sparse matrices and d
is the dict mapping coordinate pairs to matrix entries. I had the thought that the matrix constructor must be copying the dict (as is necessary generically) even though in this case it's not required, but I don't think that's the issue:
sage: time dd = copy(d) #too short CPU times: user 960 ms, sys: 73 ms, total: 1.03 s Wall time: 1.03 s sage: time dd = deepcopy(d) #too long CPU times: user 2min 21s, sys: 1.7 s, total: 2min 23s Wall time: 2min 23s
On the memory side (crudely measured by watching top
), the computation of T2b
via T2a
peaks at 5.9g whereas the computation of T2c
peaks at 1.6g, so at least that went according to plan.
And in my intended use case I work modulo some prime, and this is a big win:
sage: time T2d = C.sparse_hecke_matrix(2, base_ring=GF(2)) CPU times: user 15.3 s, sys: 317 ms, total: 15.7 s Wall time: 15.7 s sage: time T2e = C.sparse_hecke_matrix(2, base_ring=GF(next_prime(2^30))) CPU times: user 24.3 s, sys: 607 ms, total: 25 s Wall time: 25 s }
so I'm taken care of.
New commits:
8075a0c  Generate sparse Hecke matrix without dense intermediary

0bcc834  Add base_ring option to sparse_hecke_matrix, use sparse constructor

bd5a8c5  Docstring corrections

bcd410a  More minor edits

comment:5 Changed 6 years ago by
I should add that I have been running this code since I posted it and am observing a memory leak somewhere. Specifically, I've been running this code snippet (for various values of i
):
def zero_eigenvalue_multiplicity(m): #taken from trac #20788 n = 0 m1 = m r = m.dimensions()[0] while True: m2 = m1.extended_echelon_form(subdivide=True) t = r  m2.subdivisions()[0][0] n += t if t == 0 or t == r: return n m3 = m2.subdivision(0,0) m4 = m2.submatrix(0,r,r,r) m5 = m3 * m4.inverse() m1 = m5.submatrix(0,0,rt,rt) r = t for n in range((i5)*1000+1, i*1000, 2): C = CremonaModularSymbols(n, cuspidal=True, sign=1) T2 = C.sparse_hecke_matrix(2, base_ring=GF(2)).dense_matrix() t = (n, zero_eigenvalue_multiplicity(T2), zero_eigenvalue_multiplicity(T2+1))
and observing (via top
) memory usage growing from 200300mb for the initial stages to over 10gb later in the loop. My guess is that there is no serious memory leak in eclib
, and I wouldn't suspect one in m4ri
either; but I'm having trouble coming up with other options, and I'm not up to speed with valgrind
enough to try checking that way.
comment:6 Changed 6 years ago by
When eclib was first put into Sage back in about 2007 it was subjected to rigorous valgrinding by Michael Abshoff, which revealed several issues which were then fixed (with some effort!). But there has been considerable change since then, so I should do that again.
I have made this an Issue for eclib: https://github.com/JohnCremona/eclib/issues/18
comment:7 Changed 6 years ago by
Update: I can reproduce the memory leak behavior by simply repeating the code from #20788 (as reported therein). That is not to say that running valgrind again on eclib
wouldn't be advisable, but I am not specifically claiming a memory leak in the computation of Hecke matrices (though if you do find one, it would certainly help me for it to be fixed).
comment:8 Changed 6 years ago by
Oh, now this is bad news.
sage: C = CremonaModularSymbols(45, cuspidal=True,sign=1) sage: T2a = C.hecke_matrix(2).sage_matrix_over_ZZ() sage: print T2a [1 1 1] [ 0 1 2] [ 0 0 1] sage: T2b = C.sparse_hecke_matrix(2) sage: print T2b [1 1 0] [ 0 1 2] [ 0 0 3]
This is only an issue when cuspidal=True
: the eclib
function heckeop
takes the projection onto the cuspidal subspace into account but heckeop_col
does not.
It looks like s_heckeop
does account for the cuspidal projection, so switching to that should fix this problem.
comment:9 Changed 6 years ago by
Apologies for the inconsistencies. There must be quite a few methods in this homspace class which are no longer used in the main programs and whose implementations have therefore not been kept consistent or tested. Unlike Sage where every single method has its own test, I just have highlevel test programs and there must be a lot of code which is no longer tested, thought of by me as obsolete but not actually deleted. The number of people who have looked at this code in any detail is very small!
comment:10 Changed 6 years ago by
 Commit changed from bcd410a685a80decf97914b056c1058747ee785a to 6fba176c039f8f1f7d9ca8a011f5ede9eed017c0
Branch pushed to git repo; I updated commit sha1. New commits:
6fba176  Use sparse Hecke matrix from eclib, correct results for cuspidal subspace

comment:11 Changed 6 years ago by
I went ahead and switched over to s_heckeop
and this indeed resolves the issue. I am still doing the silly thing of converting a sparse vector into a dense vector before transferring it into Python; that's because I'm not familiar enough with the C++Python interaction to know how to correctly wrap the functions that provide direct access to the underlying dictlike structure of a sparse vector.
Timings are similar, but now on a completely unloaded machine, so more indicative than before:
sage: sage: time C = CremonaModularSymbols(400001, sign=1) CPU times: user 25.4 s, sys: 95 ms, total: 25.5 s Wall time: 25.5 s sage: time T2a = C.hecke_matrix(2) CPU times: user 13.7 s, sys: 4.08 s, total: 17.8 s Wall time: 17.8 s sage: time T2b = T2a.sage_matrix_over_ZZ(sparse=True) CPU times: user 25 s, sys: 59 ms, total: 25.1 s Wall time: 25.1 s sage: time T2c = C.sparse_hecke_matrix(2) CPU times: user 1min 2s, sys: 476 ms, total: 1min 3s Wall time: 1min 3s sage: time T2b == T2c CPU times: user 9.48 s, sys: 633 ms, total: 10.1 s Wall time: 10.1 s True sage: time T2c = C.sparse_hecke_matrix(2, base_ring=GF(2)) CPU times: user 17.4 s, sys: 328 ms, total: 17.7 s Wall time: 17.7 s
comment:12 Changed 5 years ago by
comment:13 Changed 5 years ago by
Note to myself: check to see whether the resolution of #22164 has had any effect on the memory leak reported above.
comment:14 Changed 5 years ago by
 Commit changed from 6fba176c039f8f1f7d9ca8a011f5ede9eed017c0 to 05e52f4f98b6d186e8934ddcb9ac4e415a7b2aef
comment:15 Changed 5 years ago by
 Status changed from needs_review to needs_work
This syntax is deprecated and shouldn't be used anymore:
for j from 1 <= j <= n:
Cython will generate efficient code for range()
if the variables are of some C type (which is the case here since you declared cdef long j
).
comment:16 Changed 5 years ago by
You should not put Python code inside sig_on()
/sig_off()
blocks.
There are two (not mutually exclusive) cases:
(A) If there is any particular nonPython statement that takes a long time or might generate signals, put sig_on()
/sig_off()
around that particular statement.
(B) If you just want to interrupt the loop, put sig_check()
inside that loop.
comment:17 Changed 5 years ago by
In the eclib.pxd
file, you have a duplicate
ctypedef int scalar
It should be declared just once.
comment:18 Changed 5 years ago by
(NOTE: I just checked the Cython programming issues, I didn't actually try to understand the code; I assume that John can do that)
comment:19 followup: ↓ 20 Changed 5 years ago by
I looked at the code. Congratulations for being the first person in history other than me and Luis Figueiredo (a student of Taylor in the 1990s) to read my sparse vector / matrix code!
What you do is go through the rows, extract the i'th row as a sparse vector, convert it to a dense vector, and then go through its entries storing the nonzero ones in the new Sage sparse matrix. This is inefficient as you look at all m*n netries (if the matrix is mxn). It would be better to extract just the nonzero entries in the first place. I can try to write that, or help someone else, but before do I need to know if Cython will have access to the "protected" (in C++ terms) i.e. semiprivate data components of the C++ sparse matrix. If so I can write the necessary lines.
comment:20 in reply to: ↑ 19 Changed 5 years ago by
Replying to cremona:
I need to know if Cython will have access to the "protected" (in C++ terms) i.e. semiprivate data components of the C++ sparse matrix. If so I can write the necessary lines.
Cython essentially just generates C or C++ code. So it won't have any more access than userwritten C++ code. So that means that it cannot access arbitrary protected members.
comment:21 Changed 5 years ago by
That makes it harder. The thing is an array of "rows" (one for each actual row even if the row is zero). We know how many from M.nrows(). For the i'th row we store the number of nonzero entries, the list of columns they are in and the list of the entries. The only difficulty is in getting the 0/1 rebasing correct since C, like Python goes from 0 bu my user interface goes from 1...
I can see that to provide the interface needed will require adding to eclib. Sorry. Meanwhile it would still be simpler to extract just the nonzero entries from the sparse vector which we already extract for each row. But even that requires access to protected members of the svec class (which are essentially dicts containing (index, netry) pairs). So that leaves this:
for i in range(n): for j in range(n): Mij = M.elem(i+1,j+1) if Mij: d[(i, j)] = Mij
I don't expect that to be any better than the current version though since for each i,j the elem() method will do a linear search along the i'th row to see if the j'th entry is present.
Sorry not to have a lowlevel interface!
comment:22 Changed 5 years ago by
A possible lowlevel interface could work like a Python iterator delivering triple (i,j,M[i,j]) with some signal at the end such as (0,0,0). Without doing a lot of work that could easily be provided as a single 3xN matrix where N is one more than the number of nonzero entries. Then the user can loop over that. But such a change to eclib is for the future, and should not delay this.
comment:23 Changed 5 years ago by
 Commit changed from 05e52f4f98b6d186e8934ddcb9ac4e415a7b2aef to 68ba41cd16b2e8f0becbdb86c905230750f0a896
Branch pushed to git repo; I updated commit sha1. New commits:
68ba41c  Corrected some Cython issues

comment:24 followup: ↓ 25 Changed 5 years ago by
 Report Upstream changed from Not yet reported upstream; Will do shortly. to None of the above  read trac for reasoning.
 Status changed from needs_work to needs_review
I fixed the Cython issues (I think).
That said, svec.h
does appear to include some access functions:
// functions to enable iteration over the entries without direct // access to the entries map: map<int,scalar>::const_iterator begin() const {return entries.begin();} map<int,scalar>::const_iterator end() const {return entries.end();} map<int,scalar>::iterator begin() {return entries.begin();} map<int,scalar>::iterator end() {return entries.end();} void erase(int i); // erases v[i]; error if not set int first_index() const {return entries.upper_bound(0)>first;} std::set<int> support() const;
that should avoid creating a dense vector; but I need to learn a bit more about Cython to figure out how to exploit those.
comment:25 in reply to: ↑ 24 Changed 5 years ago by
Replying to kedlaya:
I need to learn a bit more about Cython to figure out how to exploit those.
Cython does support that (I'm assuming that map
means std::map
):
from libcpp.map cimport map cdef extern from "...": cdef cppclass ...: map[int, scalar].iterator begin() map[int, scalar].iterator end()
comment:26 Changed 5 years ago by
 Commit changed from 68ba41cd16b2e8f0becbdb86c905230750f0a896 to 981edb0be237a3a3cc90fa3384d3944ae4755d29
Branch pushed to git repo; I updated commit sha1. New commits:
981edb0  Attempt to directly access svec

comment:27 Changed 5 years ago by
 Status changed from needs_review to needs_work
I tried to use the iterator following a model in the Cython documentation, but...:
[sagelib8.1.beta6] ... [sagelib8.1.beta6] cdef smat M = self.H.s_heckeop(p, dual, verbose) [sagelib8.1.beta6] sig_off() [sagelib8.1.beta6] for i in range(n): [sagelib8.1.beta6] sv = M.row(i+1) [sagelib8.1.beta6] iter = sv.begin() [sagelib8.1.beta6] while iter != sv.end(): [sagelib8.1.beta6] ^ [sagelib8.1.beta6]  [sagelib8.1.beta6] [sagelib8.1.beta6] sage/libs/eclib/homspace.pyx:284:23: Invalid types for '!=' (<error>, iterator)
comment:28 Changed 5 years ago by
Next time, please quote the complete error message:
[sagelib8.1.beta7] Error compiling Cython file: [sagelib8.1.beta7]  [sagelib8.1.beta7] ... [sagelib8.1.beta7] cdef long n = self.dimension() [sagelib8.1.beta7] cdef long i=0 [sagelib8.1.beta7] cdef long j=0 [sagelib8.1.beta7] cdef vec v [sagelib8.1.beta7] cdef svec sv [sagelib8.1.beta7] cdef map[int, scalar].iterator iter [sagelib8.1.beta7] ^ [sagelib8.1.beta7]  [sagelib8.1.beta7] [sagelib8.1.beta7] sage/libs/eclib/homspace.pyx:276:13: 'map' is not a type identifier [sagelib8.1.beta7] [sagelib8.1.beta7] Error compiling Cython file: [sagelib8.1.beta7]  [sagelib8.1.beta7] ... [sagelib8.1.beta7] cdef smat M = self.H.s_heckeop(p, dual, verbose) [sagelib8.1.beta7] sig_off() [sagelib8.1.beta7] for i in range(n): [sagelib8.1.beta7] sv = M.row(i+1) [sagelib8.1.beta7] iter = sv.begin() [sagelib8.1.beta7] while iter != sv.end(): [sagelib8.1.beta7] ^ [sagelib8.1.beta7]  [sagelib8.1.beta7] [sagelib8.1.beta7] sage/libs/eclib/homspace.pyx:284:23: Invalid types for '!=' (<error>, iterator)
You forgot from libcpp.map cimport map
comment:29 Changed 5 years ago by
In fact, Cython does type inference, so you could also just remove the line
cdef map[int, scalar].iterator iter
comment:30 Changed 5 years ago by
 Commit changed from 981edb0be237a3a3cc90fa3384d3944ae4755d29 to d257eabe478f2a4ff92a436afe24f5122e8011dd
comment:31 Changed 5 years ago by
 Status changed from needs_work to needs_review
OK, I think I now have a working interface that avoids instantiating any dense vectors.
comment:32 Changed 5 years ago by
 Branch changed from u/kedlaya/make_hecke_operators_not_blow_up_the_memory to u/roed/make_hecke_operators_not_blow_up_the_memory
comment:33 Changed 5 years ago by
 Commit changed from d257eabe478f2a4ff92a436afe24f5122e8011dd to 6c3bac591bde14d3bd23bb4f983af24a7b8077b6
 Reviewers set to David Roe
 Status changed from needs_review to positive_review
comment:34 Changed 5 years ago by
 Branch changed from u/roed/make_hecke_operators_not_blow_up_the_memory to 6c3bac591bde14d3bd23bb4f983af24a7b8077b6
 Resolution set to fixed
 Status changed from positive_review to closed
The
sage_matrix_over_ZZ
method of internalMatrix
class in theeclib
wrapper does have a flagsparse
(defaults toFalse
), but at this point it is too late; the Hecke matrix was already created as an eclib dense matrix (i.e., a C++ array oflong
's). If I understand correctly, what is really needed is a flag in thehecke_matrix
method that gets passed toeclib
, so that the underlying C++ code switches correctly.