Opened 8 years ago
Closed 3 years ago
#12453 closed enhancement (fixed)
Refactor integer vectors using ClonableIntArray Cython data structure
Reported by:  nborie  Owned by:  nborie 

Priority:  major  Milestone:  sage7.4 
Component:  combinatorics  Keywords:  Integer, vectors, cython, clone, Cernay2012 
Cc:  sagecombinat, chapoton, darij  Merged in:  
Authors:  Travis Scrimshaw  Reviewers:  Frédéric Chapoton, Nathann Cohen 
Report Upstream:  N/A  Work issues:  
Branch:  4aaba42 (Commits)  Commit:  4aaba4280940c4defae572c3e16fb213406bc613 
Dependencies:  Stopgaps: 
Description
Why that ? Because :
 Integer Vectors have no element class
 Integer Vectors elements must be immutable and hashable
 Clean the old code with new specification
 Add documentation
 Deal with coeherence beetween extra arguments of the constructor
 ...
Attachments (1)
Change History (74)
comment:1 Changed 7 years ago by
 Status changed from new to needs_review
comment:2 Changed 7 years ago by
 Dependencies set to #10193
comment:3 Changed 7 years ago by
 Status changed from needs_review to needs_work
Please update the patch as it does not apply on sage5.10
patching file sage/combinat/integer_vector_weighted.py Hunk #1 FAILED at 2 Hunk #3 FAILED at 64 Hunk #4 FAILED at 173
comment:4 Changed 7 years ago by
 Dependencies changed from #10193 to #10193 #14101
 Status changed from needs_work to needs_review
comment:5 Changed 7 years ago by
 Type changed from task to enhancement
Changed 7 years ago by
comment:6 Changed 6 years ago by
 Milestone changed from sage5.11 to sage5.12
comment:7 Changed 6 years ago by
Branch version at public/refactor_integer_vectors12453 but trac is responding with a unicode error.
comment:8 Changed 6 years ago by
 Branch set to public/refactor_integer_vectors12453
 Commit set to ef2ddcec6d5a469b982d34af7ff09c7af9500809
New commits:
ef2ddce  #12453: Refactored IntegerVectors to use ClonableIntArray.

comment:9 Changed 6 years ago by
 Commit changed from ef2ddcec6d5a469b982d34af7ff09c7af9500809 to b9f278e0d06b1ad38e231a2ac11381e62b6a2b76
comment:10 Changed 6 years ago by
 Milestone changed from sage6.1 to sage6.2
comment:11 Changed 6 years ago by
 Commit changed from b9f278e0d06b1ad38e231a2ac11381e62b6a2b76 to 12bac19c3a9d720363788baa7e4304fa18f7821f
Branch pushed to git repo; I updated commit sha1. New commits:
12bac19  Merge branch 'develop' into public/refactor_integer_vectors12453

comment:12 Changed 6 years ago by
 Commit changed from 12bac19c3a9d720363788baa7e4304fa18f7821f to 3008ee6a34c023370071bfe1fb324f2b479462ed
comment:13 Changed 6 years ago by
Before this branch
sage: %time _=list(IntegerVectors(5,22)) CPU times: user 1.78 s, sys: 8 ms, total: 1.78 s Wall time: 1.81 s
After the branch
sage: %time _=list(IntegerVectors(5,22)) CPU times: user 19.3 s, sys: 60 ms, total: 19.4 s Wall time: 19.6 s
With a 5lines code
from itertools import combinations def integervectors(n,k): L = n+k1 for x in combinations(range(L),k1): l = [x[i]  x[i1]1 for i in range(1,k1)] l.insert(0,x[0]) l.append(Lx[1]1) yield l sage: %time _=list(integervectors(5,22)) CPU times: user 492 ms, sys: 16 ms, total: 508 ms Wall time: 515 ms
I think this can be classified as a severe regression. Besides, it is probably better to avoid this kind of stuff present in your branch:
for c in IntegerVectors(self.size  k, i1): c = list(c)
If you only want the data, don't generate complex objects in the first place.
Nathann
comment:14 Changed 6 years ago by
 Status changed from needs_review to needs_work
comment:15 Changed 6 years ago by
While this algorithm works (modulo corner cases), it doesn't return them in lex order (which is something we'd want to do). Can you give a variant of it which does yield in lex order?
comment:16 Changed 6 years ago by
Nevermind, I think I have one.
comment:17 Changed 6 years ago by
?
On which instance doesn't it give a lexicographic ordering ? O_o
Nathann
comment:18 Changed 6 years ago by
(1,2) gives: [0, 1] [1, 0]
. In fact, try (3,3)
and it's not something easily tweaked.
comment:19 Changed 6 years ago by
O_o
sage: list(integervectors(1,2)) [[0, 1], [1, 0]] sage: list(integervectors(3,3)) [[0, 0, 3], [0, 1, 2], [0, 2, 1], [0, 3, 0], [1, 0, 2], [1, 1, 1], [1, 2, 0], [2, 0, 1], [2, 1, 0], [3, 0, 0]]
Isn't that the lexicographic ordering ? O_o
sage: list(integervectors(3,3)) == sorted(list(integervectors(3,3))) True
comment:20 followup: ↓ 21 Changed 6 years ago by
I meant revlex.
comment:21 in reply to: ↑ 20 Changed 6 years ago by
By your convention, I mean revlex.
Me and quite a lot of people (wikipedia, dictionaries, phone books, Python) seem to agree on what a lexicographic ordering is :P
It looks like IntegerVector
returns them in decreasing lexicographic order, so I guess that's what you call "revlex" (sorry I always mix colex and revlex).
What about computing the complement of x ? In Python it wastes a lot of time, but the algorithm works.... :P
def integervectors(n,k): L = n+k1 for x in combinations(range(L),L(k1)): x_complement = [] j = 0 for i in range(L): if x[j] == i: if j < L(k1)1: j += 1 continue else: x_complement.append(i) x = x_complement l = [x[i]  x[i1]1 for i in range(1,k1)] l.insert(0,x[0]) l.append(Lx[1]1) yield l sage: list(integervectors(5,5)) == [list(map(int,x)) for x in IntegerVectors(5,5)] True
comment:22 Changed 6 years ago by
 Commit changed from 3008ee6a34c023370071bfe1fb324f2b479462ed to 71858e198f18832b07176233438914f7d90bd35f
Branch pushed to git repo; I updated commit sha1. New commits:
5a88839  Merge branch 'public/refactor_integer_vectors12453' of trac.sagemath.org:sage into public/refactor_integer_vectors12453

ae5db31  Merge branch 'develop' into public/refactor_integer_vectors12453

71858e1  Fixes for integer vectors.

comment:23 Changed 6 years ago by
I thought I had a method but it didn't work. Python's itertool
forces it through lex ordering, so I just did a brute compute and reverse (which is far less than ideal). I'll look into your revised algorithm tonight.
comment:24 Changed 6 years ago by
 Commit changed from 71858e198f18832b07176233438914f7d90bd35f to 0addeb2f6a192c38ef64d733a6a8f9e9e35d4da7
Branch pushed to git repo; I updated commit sha1. New commits:
5f18c77  Merge branch 'develop' into public/refactor_integer_vectors12453

d904ede  Merge branch 'develop' into public/refactor_integer_vectors12453

0fb27ab  Merge branch 'develop' into public/refactor_integer_vectors12453

31402be  Merge branch 'develop' into public/refactor_integer_vectors12453

0addeb2  Merge branch 'public/refactor_integer_vectors12453' of trac.sagemath.org:sage into public/refactor_integer_vectors12453

comment:25 Changed 6 years ago by
 Milestone changed from sage6.2 to sage6.3
comment:26 Changed 5 years ago by
 Commit changed from 0addeb2f6a192c38ef64d733a6a8f9e9e35d4da7 to 822e1cb12097cf96039bb3251b782690fac1a42e
comment:27 followup: ↓ 28 Changed 5 years ago by
 Status changed from needs_work to needs_review
Hey Nathann,
I've implemented your algorithm and everything seems to work. Can you do the rest of the review for this?
comment:28 in reply to: ↑ 27 Changed 5 years ago by
Hello !
I've implemented your algorithm and everything seems to work. Can you do the rest of the review for this?
No, sorry. Your branch contains 14 commits, almost all of which are entitled "Merge with develop". Besides, I see some code in your branch in which you just change the indentation, and so it seems that I should not rereview all that code, but given how your branch is made I have no way to know what needs an actual review.
Besides, I can spend time on the review of any "local" thing (like this enumeration algorithm) and I will do it gladly if you have anything specific in mind, but I hate parents, elements and almost anything that is immutable. I am the last person you want to review code like that.
I would start to make unpleasant reviews like: "It has been reported on Sagedevel that having a cached function return a UniqueRepresentation object resulted in a memory leak. Clearly those integer vectors will be used by cached function, so you are creating a new memory leak until this finally gets fixed".
https://groups.google.com/d/topic/sagedevel/q5uy_lI11jg/discussion
So I'm sorry but you probably want somebody else to review this ticket.
Nathann
comment:29 Changed 5 years ago by
 Milestone changed from sage6.3 to sage6.4
comment:30 Changed 5 years ago by
 Status changed from needs_review to needs_work
needs rebase, des not apply
comment:31 Changed 3 years ago by
 Commit changed from 822e1cb12097cf96039bb3251b782690fac1a42e to 8bc466dd7c129af8eab37ca11e77c399d502b831
Branch pushed to git repo; I updated commit sha1. New commits:
8bc466d  Merge branch 'public/refactor_integer_vectors12453' of git://trac.sagemath.org/sage into public/refactor_integer_vectors12453

comment:32 Changed 3 years ago by
 Milestone changed from sage6.4 to sage7.4
 Status changed from needs_work to needs_review
comment:33 Changed 3 years ago by
doc does not build:
+[dochtml] OSError: [combinat ] /home/worker/sagepatchbot/local/lib/python2.7/sitepackages/sage/combinat/integer_vector.py:docstring of sage.combinat.integer_vector.IntegerVectors_nnondescents:9: WARNING: Inline literal startstring without endstring.
and one should no longer use it.next()
bu instead use next(it)
++ sage: [it.next() for x in range(10)] ++ sage: [it.next() for x in range(10)] ++ sage: [it.next() for x in range(10)]
comment:34 Changed 3 years ago by
 Commit changed from 8bc466dd7c129af8eab37ca11e77c399d502b831 to 17620969ef1be886cb39b70eae1c6a8a3bfeb9ff
comment:35 Changed 3 years ago by
This should fix the doc (although I can't test right now because Sage is recompiling) and fixed the next
.
comment:36 Changed 3 years ago by
Followup, doc built cleanly for me.
comment:37 Changed 3 years ago by
 Status changed from needs_review to needs_work
triggers doctest failures in sandpiles
comment:38 Changed 3 years ago by
 Commit changed from 17620969ef1be886cb39b70eae1c6a8a3bfeb9ff to bab162bcd00364ce8113ddf97d23b2b6d985d688
Branch pushed to git repo; I updated commit sha1. New commits:
bab162b  Fixing doctest failures in sandpile.py.

comment:40 Changed 3 years ago by
Patchbot is green.
comment:41 Changed 3 years ago by
Before
sage: %timeit list(IntegerVectors(5,22)) 1 loop, best of 3: 190 ms per loop
After
sage: %timeit list(IntegerVectors(5,22)) 1 loop, best of 3: 3.86 s per loop
which looks really bad..
comment:42 Changed 3 years ago by
Basically all of the time is eaten up by integer_vectors_nk_fast_iter
. I was hoping to line profile it, but that is broken since the IPython upgrade. I'm looking into this.
comment:43 Changed 3 years ago by
So it seems Nathann's algorithm is just slower than the original version, so I'm just reverting it. I would like to %lprun
it... *sigh*
comment:44 Changed 3 years ago by
 Commit changed from bab162bcd00364ce8113ddf97d23b2b6d985d688 to c8d2f6c1d22a85572072faacbb8d87bccdb234ed
Branch pushed to git repo; I updated commit sha1. New commits:
c8d2f6c  Reverting to previous iterator.

comment:45 Changed 3 years ago by
 Commit changed from c8d2f6c1d22a85572072faacbb8d87bccdb234ed to d312832813eaed3853df658a2321c57d9ddbb21a
Branch pushed to git repo; I updated commit sha1. New commits:
d312832  Take advantage of the listreturning iterator.

comment:46 Changed 3 years ago by
Now I get about a 3x slowdown for the iterator over the parent, but considering that this is basically overhead from the parent/element stuff, we won't be able to get much else. Plus we are at such small time scales and if you're iterating over the parent, then this should not be the bottleneck.
However, calling the iterator itself is now about 10% faster than before. So I also update those parts of the codebase which want a honest list back and had them call the iterator directly.
comment:47 Changed 3 years ago by
 Commit changed from d312832813eaed3853df658a2321c57d9ddbb21a to 92360bdd3e33074e9dd72d4078efcf25bc63c70e
comment:48 Changed 3 years ago by
 Commit changed from 92360bdd3e33074e9dd72d4078efcf25bc63c70e to a4be99126f555986e774e0337735d72442eb6479
Branch pushed to git repo; I updated commit sha1. New commits:
a4be991  Merge branch 'develop' into public/refactor_integer_vectors12453

comment:49 Changed 3 years ago by
Patchbot is green.
comment:50 Changed 3 years ago by
 Commit changed from a4be99126f555986e774e0337735d72442eb6479 to a52b821dfe1fd9d3a7a26783f481a649b01ee5fa
Branch pushed to git repo; I updated commit sha1. New commits:
a52b821  Merge branch 'develop' into public/refactor_integer_vectors12453

comment:51 Changed 3 years ago by
 Commit changed from a52b821dfe1fd9d3a7a26783f481a649b01ee5fa to f4f8149bd62d1ed9a88052d1f5f7028bc3870f09
Branch pushed to git repo; I updated commit sha1. New commits:
f4f8149  Merge branch 'develop' into public/refactor_integer_vectors12453

comment:53 Changed 3 years ago by
 Commit changed from f4f8149bd62d1ed9a88052d1f5f7028bc3870f09 to 282f39be1d459a0aa6ce15de5afc5a147dcf9376
Branch pushed to git repo; I updated commit sha1. New commits:
282f39b  We have stricter equality checks based upon the parent.

comment:54 Changed 3 years ago by
 Commit changed from 282f39be1d459a0aa6ce15de5afc5a147dcf9376 to ff49fdf26d4510af8d18b658a97c805e177006a5
Branch pushed to git repo; I updated commit sha1. New commits:
074dc19  Merge branch 'develop' into public/refactor_integer_vectors12453

7401293  Merge branch 'develop' into public/refactor_integer_vectors12453

c6dd087  Merge branch 'develop' into public/refactor_integer_vectors12453

98ecbad  Merge branch 'develop' into public/refactor_integer_vectors12453

ddce440  Merge branch 'develop' into public/refactor_integer_vectors12453

ff49fdf  Merge branch 'develop' into public/refactor_integer_vectors12453

comment:56 Changed 3 years ago by
 Commit changed from ff49fdf26d4510af8d18b658a97c805e177006a5 to 707b5198255b05a67faeb209764b65ed0a1292db
comment:57 Changed 3 years ago by
 Status changed from needs_work to needs_review
I would appreciate it if you could finish the review Frédéric.
comment:58 Changed 3 years ago by
 Commit changed from 707b5198255b05a67faeb209764b65ed0a1292db to 2c221cfebb99a7b3255c61bde71f652e49ed5aba
Branch pushed to git repo; I updated commit sha1. New commits:
2c221cf  Fixing trivial doctest.

comment:59 Changed 3 years ago by
typo "yeilds Python"
I am not sure I am able to do a review. This is rather large a ticket..
Would you please provide some timings ? I am still a lot worried about that.
And explain again what is gained, maybe at the loss of some speed ?
comment:60 Changed 3 years ago by
before
sage: %timeit list(IntegerVectors(5,22)) 1 loop, best of 3: 193 ms per loop
after
sage: %timeit list(IntegerVectors(5,22)) 1 loop, best of 3: 622 ms per loop
so this is not so bad, but still a 3 times slowdown. What do we gain for this prize ?
comment:61 Changed 3 years ago by
 Commit changed from 2c221cfebb99a7b3255c61bde71f652e49ed5aba to 0b7514784e9576de0db70f69df099138ebdffa5b
Branch pushed to git repo; I updated commit sha1. New commits:
0b75147  Fixing typo yeilds > yields.

comment:62 Changed 3 years ago by
That is around what I am getting as well. The advantage is that this is now a proper parent class with elements and fits into the rest of Sage. If you want to have the speed but not the parent/element overhead, I provided an iterator for that:
sage: %timeit list(IntegerVectors(5,22)) 1 loop, best of 3: 238 ms per loop sage: from sage.combinat.integer_vector import integer_vectors_nk_fast_iter sage: %timeit list(integer_vectors_nk_fast_iter(5,22)) 10 loops, best of 3: 63.7 ms per loop
Old:
sage: %timeit list(IntegerVectors(5,22)) 10 loops, best of 3: 76.4 ms per loop
Although if you use int
and Cython, you can get a very fast iterator. Although converting those back to Integer
's was actually slower than working directly with Integer
's.
I now see the problem with why that was taking so long. The class I'm using actually stores the input data as int
's, not Integer
's. So I think I need to change the base class for the elements.
comment:63 Changed 3 years ago by
 Commit changed from 0b7514784e9576de0db70f69df099138ebdffa5b to fa63e8f867108571b8306dbdfaaa4d35fdffde9a
Branch pushed to git repo; I updated commit sha1. New commits:
fa63e8f  Using ClonableArray instead of ClonableIntArray.

comment:64 Changed 3 years ago by
Okay, using ClonableArray
makes things a little bit better:
%timeit list(IntegerVectors(5,22)) 1 loop, best of 3: 209 ms per loop
comment:65 Changed 3 years ago by
 Commit changed from fa63e8f867108571b8306dbdfaaa4d35fdffde9a to 5553a1bd7f7f29215071bff42e500c3fbd63dfe4
Branch pushed to git repo; I updated commit sha1. New commits:
5553a1b  Squeezing some more speed out of the iterator.

comment:66 Changed 3 years ago by
 Commit changed from 5553a1bd7f7f29215071bff42e500c3fbd63dfe4 to c37d077a26d9bc5893d3366a23df887530e2fc0a
Branch pushed to git repo; I updated commit sha1. New commits:
c37d077  Make sure every element in the list is a Sage Integer.

comment:67 Changed 3 years ago by
By not popping and pushing the list (the iterator wants a fixed length) and some reworking, I got it to:
sage: %timeit list(IntegerVectors(5,22)) 1 loop, best of 3: 188 ms per loop
Although I did get some slowdown because rem
was not an Integer
for the entire loop. So I think this is probably as fast as I can get it.
comment:68 Changed 3 years ago by
 Commit changed from c37d077a26d9bc5893d3366a23df887530e2fc0a to 4aaba4280940c4defae572c3e16fb213406bc613
Branch pushed to git repo; I updated commit sha1. New commits:
4aaba42  trac 12453 reviewer's commit

comment:69 Changed 3 years ago by
Looks better now indeed. I have made a reviewer's commit (cosmetic).
Once the bot is green, you can set a positive review.
comment:70 Changed 3 years ago by
 Reviewers set to Frédéric Chapoton, Nathann Cohen
Bot is essentially green, so positive review. Thank you.
comment:71 Changed 3 years ago by
 Status changed from needs_review to positive_review
comment:72 Changed 3 years ago by
 Dependencies #10193 #14101 deleted
FYI I'm only merging tickets whose dependencies are merged git branches.
comment:73 Changed 3 years ago by
 Branch changed from public/refactor_integer_vectors12453 to 4aaba4280940c4defae572c3e16fb213406bc613
 Resolution set to fixed
 Status changed from positive_review to closed
Initial version.