Opened 8 years ago

Closed 3 years ago

#12453 closed enhancement (fixed)

Refactor integer vectors using ClonableIntArray Cython data structure

Reported by: nborie Owned by: nborie
Priority: major Milestone: sage-7.4
Component: combinatorics Keywords: Integer, vectors, cython, clone, Cernay2012
Cc: sage-combinat, chapoton, darij Merged in:
Authors: Travis Scrimshaw Reviewers: Frédéric Chapoton, Nathann Cohen
Report Upstream: N/A Work issues:
Branch: 4aaba42 (Commits) Commit: 4aaba4280940c4defae572c3e16fb213406bc613
Dependencies: Stopgaps:

Description

Why that ? Because :

  • Integer Vectors have no element class
  • Integer Vectors elements must be immutable and hashable
  • Clean the old code with new specification
  • Add documentation
  • Deal with coeherence beetween extra arguments of the constructor
  • ...

Attachments (1)

trac_12453-refactor_integer_vectors-ts.patch (82.6 KB) - added by tscrim 6 years ago.

Download all attachments as: .zip

Change History (74)

comment:1 Changed 7 years ago by tscrim

  • Authors set to Travis Scrimshaw
  • Status changed from new to needs_review

Initial version.

comment:2 Changed 7 years ago by tscrim

  • Dependencies set to #10193

comment:3 Changed 7 years ago by vdelecroix

  • Status changed from needs_review to needs_work

Please update the patch as it does not apply on sage-5.10

patching file sage/combinat/integer_vector_weighted.py
Hunk #1 FAILED at 2
Hunk #3 FAILED at 64
Hunk #4 FAILED at 173

comment:4 Changed 6 years ago by tscrim

  • Dependencies changed from #10193 to #10193 #14101
  • Status changed from needs_work to needs_review

New version (which can be commuted past #14101 and #14772 at minor cost due to doc[string] changes). Also part of #12913.

comment:5 Changed 6 years ago by tscrim

  • Type changed from task to enhancement

comment:6 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

comment:7 Changed 6 years ago by tscrim

Branch version at public/refactor_integer_vectors-12453 but trac is responding with a 'unicode' object not callable error.

Last edited 6 years ago by tscrim (previous) (diff)

comment:8 Changed 6 years ago by tscrim

  • Branch set to public/refactor_integer_vectors-12453
  • Commit set to ef2ddcec6d5a469b982d34af7ff09c7af9500809

New commits:

ef2ddce#12453: Refactored IntegerVectors to use ClonableIntArray.

comment:9 Changed 6 years ago by git

  • Commit changed from ef2ddcec6d5a469b982d34af7ff09c7af9500809 to b9f278e0d06b1ad38e231a2ac11381e62b6a2b76

Branch pushed to git repo; I updated commit sha1. New commits:

eb793ccMerge branch 'develop' into public/refactor_integer_vectors-12453
b9f278eMerge branch 'develop' into public/refactor_integer_vectors-12453

comment:10 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.1 to sage-6.2

comment:11 Changed 6 years ago by git

  • Commit changed from b9f278e0d06b1ad38e231a2ac11381e62b6a2b76 to 12bac19c3a9d720363788baa7e4304fa18f7821f

Branch pushed to git repo; I updated commit sha1. New commits:

12bac19Merge branch 'develop' into public/refactor_integer_vectors-12453

comment:12 Changed 6 years ago by git

  • Commit changed from 12bac19c3a9d720363788baa7e4304fa18f7821f to 3008ee6a34c023370071bfe1fb324f2b479462ed

Branch pushed to git repo; I updated commit sha1. New commits:

04c5239Merge branch 'develop' into public/refactor_integer_vectors-12453
7393d3aMerge branch 'develop' into public/refactor_integer_vectors-12453
3008ee6Merge branch 'develop' into public/refactor_integer_vectors-12453

comment:13 Changed 6 years ago by ncohen

Before this branch

sage: %time _=list(IntegerVectors(5,22))
CPU times: user 1.78 s, sys: 8 ms, total: 1.78 s
Wall time: 1.81 s

After the branch

sage: %time _=list(IntegerVectors(5,22))
CPU times: user 19.3 s, sys: 60 ms, total: 19.4 s
Wall time: 19.6 s

With a 5-lines code

from itertools import combinations
def integervectors(n,k):
    L = n+k-1
    for x in combinations(range(L),k-1):
        l = [x[i] - x[i-1]-1 for i in range(1,k-1)]
        l.insert(0,x[0])
        l.append(L-x[-1]-1)
        yield l

sage: %time _=list(integervectors(5,22))
CPU times: user 492 ms, sys: 16 ms, total: 508 ms
Wall time: 515 ms

I think this can be classified as a severe regression. Besides, it is probably better to avoid this kind of stuff present in your branch:

for c in IntegerVectors(self.size - k, i-1):
    c = list(c)

If you only want the data, don't generate complex objects in the first place.

Nathann

comment:14 Changed 6 years ago by ncohen

  • Status changed from needs_review to needs_work

comment:15 Changed 6 years ago by tscrim

While this algorithm works (modulo corner cases), it doesn't return them in lex order (which is something we'd want to do). Can you give a variant of it which does yield in lex order?

comment:16 Changed 6 years ago by tscrim

Nevermind, I think I have one.

comment:17 Changed 6 years ago by ncohen

?

On which instance doesn't it give a lexicographic ordering ? O_o

Nathann

comment:18 Changed 6 years ago by tscrim

(1,2) gives: [0, 1] [1, 0]. In fact, try (3,3) and it's not something easily tweaked.

comment:19 Changed 6 years ago by ncohen

O_o

sage: list(integervectors(1,2))
[[0, 1], [1, 0]]
sage: list(integervectors(3,3))
[[0, 0, 3],
 [0, 1, 2],
 [0, 2, 1],
 [0, 3, 0],
 [1, 0, 2],
 [1, 1, 1],
 [1, 2, 0],
 [2, 0, 1],
 [2, 1, 0],
 [3, 0, 0]]

Isn't that the lexicographic ordering ? O_o

sage: list(integervectors(3,3)) == sorted(list(integervectors(3,3)))
True

comment:20 follow-up: Changed 6 years ago by tscrim

I meant revlex.

Last edited 6 years ago by tscrim (previous) (diff)

comment:21 in reply to: ↑ 20 Changed 6 years ago by ncohen

By your convention, I mean revlex.

Me and quite a lot of people (wikipedia, dictionaries, phone books, Python) seem to agree on what a lexicographic ordering is :-P

It looks like IntegerVector returns them in decreasing lexicographic order, so I guess that's what you call "revlex" (sorry I always mix colex and revlex).

What about computing the complement of x ? In Python it wastes a lot of time, but the algorithm works.... :-P

def integervectors(n,k):
    L = n+k-1
    for x in combinations(range(L),L-(k-1)):
        x_complement = []
        j = 0
        for i in range(L):
            if x[j] == i:
                if j < L-(k-1)-1:
		    j += 1
		continue
            else:
                x_complement.append(i)
	x = x_complement
        l = [x[i] - x[i-1]-1 for i in range(1,k-1)]
        l.insert(0,x[0])
        l.append(L-x[-1]-1)
        yield l

sage: list(integervectors(5,5)) == [list(map(int,x)) for x in IntegerVectors(5,5)]
True

comment:22 Changed 6 years ago by git

  • Commit changed from 3008ee6a34c023370071bfe1fb324f2b479462ed to 71858e198f18832b07176233438914f7d90bd35f

Branch pushed to git repo; I updated commit sha1. New commits:

5a88839Merge branch 'public/refactor_integer_vectors-12453' of trac.sagemath.org:sage into public/refactor_integer_vectors-12453
ae5db31Merge branch 'develop' into public/refactor_integer_vectors-12453
71858e1Fixes for integer vectors.

comment:23 Changed 6 years ago by tscrim

I thought I had a method but it didn't work. Python's itertool forces it through lex ordering, so I just did a brute compute and reverse (which is far less than ideal). I'll look into your revised algorithm tonight.

comment:24 Changed 6 years ago by git

  • Commit changed from 71858e198f18832b07176233438914f7d90bd35f to 0addeb2f6a192c38ef64d733a6a8f9e9e35d4da7

Branch pushed to git repo; I updated commit sha1. New commits:

5f18c77Merge branch 'develop' into public/refactor_integer_vectors-12453
d904edeMerge branch 'develop' into public/refactor_integer_vectors-12453
0fb27abMerge branch 'develop' into public/refactor_integer_vectors-12453
31402beMerge branch 'develop' into public/refactor_integer_vectors-12453
0addeb2Merge branch 'public/refactor_integer_vectors-12453' of trac.sagemath.org:sage into public/refactor_integer_vectors-12453

comment:25 Changed 6 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:26 Changed 5 years ago by git

  • Commit changed from 0addeb2f6a192c38ef64d733a6a8f9e9e35d4da7 to 822e1cb12097cf96039bb3251b782690fac1a42e

Branch pushed to git repo; I updated commit sha1. New commits:

1501865Merge commit '71858e198f18832b07176233438914f7d90bd35f' into public/refactor_integer_vectors-12453
822e1cbMerge branch 'public/refactor_integer_vectors-12453' of trac.sagemath.org:sage into public/refactor_integer_vectors-12453

comment:27 follow-up: Changed 5 years ago by tscrim

  • Status changed from needs_work to needs_review

Hey Nathann,

I've implemented your algorithm and everything seems to work. Can you do the rest of the review for this?

comment:28 in reply to: ↑ 27 Changed 5 years ago by ncohen

Hello !

I've implemented your algorithm and everything seems to work. Can you do the rest of the review for this?

No, sorry. Your branch contains 14 commits, almost all of which are entitled "Merge with develop". Besides, I see some code in your branch in which you just change the indentation, and so it seems that I should not re-review all that code, but given how your branch is made I have no way to know what needs an actual review.

Besides, I can spend time on the review of any "local" thing (like this enumeration algorithm) and I will do it gladly if you have anything specific in mind, but I hate parents, elements and almost anything that is immutable. I am the last person you want to review code like that.

I would start to make unpleasant reviews like: "It has been reported on Sage-devel that having a cached function return a UniqueRepresentation object resulted in a memory leak. Clearly those integer vectors will be used by cached function, so you are creating a new memory leak until this finally gets fixed".

https://groups.google.com/d/topic/sage-devel/q5uy_lI11jg/discussion

So I'm sorry but you probably want somebody else to review this ticket.

Nathann

comment:29 Changed 5 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:30 Changed 5 years ago by chapoton

  • Status changed from needs_review to needs_work

needs rebase, des not apply

comment:31 Changed 3 years ago by git

  • Commit changed from 822e1cb12097cf96039bb3251b782690fac1a42e to 8bc466dd7c129af8eab37ca11e77c399d502b831

Branch pushed to git repo; I updated commit sha1. New commits:

8bc466dMerge branch 'public/refactor_integer_vectors-12453' of git://trac.sagemath.org/sage into public/refactor_integer_vectors-12453

comment:32 Changed 3 years ago by tscrim

  • Milestone changed from sage-6.4 to sage-7.4
  • Status changed from needs_work to needs_review

comment:33 Changed 3 years ago by chapoton

doc does not build:

+[dochtml] OSError: [combinat ] /home/worker/sage-patchbot/local/lib/python2.7/site-packages/sage/combinat/integer_vector.py:docstring of sage.combinat.integer_vector.IntegerVectors_nnondescents:9: WARNING: Inline literal start-string without end-string.

and one should no longer use it.next() bu instead use next(it)

++ sage: [it.next() for x in range(10)] ++ sage: [it.next() for x in range(10)] ++ sage: [it.next() for x in range(10)]

comment:34 Changed 3 years ago by git

  • Commit changed from 8bc466dd7c129af8eab37ca11e77c399d502b831 to 17620969ef1be886cb39b70eae1c6a8a3bfeb9ff

Branch pushed to git repo; I updated commit sha1. New commits:

8a32e27Merge branch 'develop' into public/refactor_integer_vectors-12453
1762096Fixing docbuilding (hopefully) and Python3 compatibility.

comment:35 Changed 3 years ago by tscrim

This should fix the doc (although I can't test right now because Sage is recompiling) and fixed the next.

comment:36 Changed 3 years ago by tscrim

Followup, doc built cleanly for me.

comment:37 Changed 3 years ago by chapoton

  • Status changed from needs_review to needs_work

triggers doctest failures in sandpiles

comment:38 Changed 3 years ago by git

  • Commit changed from 17620969ef1be886cb39b70eae1c6a8a3bfeb9ff to bab162bcd00364ce8113ddf97d23b2b6d985d688

Branch pushed to git repo; I updated commit sha1. New commits:

bab162bFixing doctest failures in sandpile.py.

comment:39 Changed 3 years ago by tscrim

  • Status changed from needs_work to needs_review

Fixed.

comment:40 Changed 3 years ago by tscrim

Patchbot is green.

comment:41 Changed 3 years ago by chapoton

Before

sage: %timeit list(IntegerVectors(5,22))
1 loop, best of 3: 190 ms per loop

After

sage: %timeit list(IntegerVectors(5,22)) 
1 loop, best of 3: 3.86 s per loop

which looks really bad..

comment:42 Changed 3 years ago by tscrim

Basically all of the time is eaten up by integer_vectors_nk_fast_iter. I was hoping to line profile it, but that is broken since the IPython upgrade. I'm looking into this.

comment:43 Changed 3 years ago by tscrim

So it seems Nathann's algorithm is just slower than the original version, so I'm just reverting it. I would like to %lprun it... *sigh*

comment:44 Changed 3 years ago by git

  • Commit changed from bab162bcd00364ce8113ddf97d23b2b6d985d688 to c8d2f6c1d22a85572072faacbb8d87bccdb234ed

Branch pushed to git repo; I updated commit sha1. New commits:

c8d2f6cReverting to previous iterator.

comment:45 Changed 3 years ago by git

  • Commit changed from c8d2f6c1d22a85572072faacbb8d87bccdb234ed to d312832813eaed3853df658a2321c57d9ddbb21a

Branch pushed to git repo; I updated commit sha1. New commits:

d312832Take advantage of the list-returning iterator.

comment:46 Changed 3 years ago by tscrim

Now I get about a 3x slowdown for the iterator over the parent, but considering that this is basically overhead from the parent/element stuff, we won't be able to get much else. Plus we are at such small time scales and if you're iterating over the parent, then this should not be the bottleneck.

However, calling the iterator itself is now about 10% faster than before. So I also update those parts of the codebase which want a honest list back and had them call the iterator directly.

comment:47 Changed 3 years ago by git

  • Commit changed from d312832813eaed3853df658a2321c57d9ddbb21a to 92360bdd3e33074e9dd72d4078efcf25bc63c70e

Branch pushed to git repo; I updated commit sha1. New commits:

60070a9Merge branch 'develop' into public/refactor_integer_vectors-12453
92360bdFixing doctests from experiments.

comment:48 Changed 3 years ago by git

  • Commit changed from 92360bdd3e33074e9dd72d4078efcf25bc63c70e to a4be99126f555986e774e0337735d72442eb6479

Branch pushed to git repo; I updated commit sha1. New commits:

a4be991Merge branch 'develop' into public/refactor_integer_vectors-12453

comment:49 Changed 3 years ago by tscrim

Patchbot is green.

comment:50 Changed 3 years ago by git

  • Commit changed from a4be99126f555986e774e0337735d72442eb6479 to a52b821dfe1fd9d3a7a26783f481a649b01ee5fa

Branch pushed to git repo; I updated commit sha1. New commits:

a52b821Merge branch 'develop' into public/refactor_integer_vectors-12453

comment:51 Changed 3 years ago by git

  • Commit changed from a52b821dfe1fd9d3a7a26783f481a649b01ee5fa to f4f8149bd62d1ed9a88052d1f5f7028bc3870f09

Branch pushed to git repo; I updated commit sha1. New commits:

f4f8149Merge branch 'develop' into public/refactor_integer_vectors-12453

comment:52 Changed 3 years ago by tscrim

  • Cc chapoton darij added

Rebased on 7.4.beta6.

comment:53 Changed 3 years ago by git

  • Commit changed from f4f8149bd62d1ed9a88052d1f5f7028bc3870f09 to 282f39be1d459a0aa6ce15de5afc5a147dcf9376

Branch pushed to git repo; I updated commit sha1. New commits:

282f39bWe have stricter equality checks based upon the parent.

comment:54 Changed 3 years ago by git

  • Commit changed from 282f39be1d459a0aa6ce15de5afc5a147dcf9376 to ff49fdf26d4510af8d18b658a97c805e177006a5

Branch pushed to git repo; I updated commit sha1. New commits:

074dc19Merge branch 'develop' into public/refactor_integer_vectors-12453
7401293Merge branch 'develop' into public/refactor_integer_vectors-12453
c6dd087Merge branch 'develop' into public/refactor_integer_vectors-12453
98ecbadMerge branch 'develop' into public/refactor_integer_vectors-12453
ddce440Merge branch 'develop' into public/refactor_integer_vectors-12453
ff49fdfMerge branch 'develop' into public/refactor_integer_vectors-12453

comment:55 Changed 3 years ago by chapoton

  • Status changed from needs_review to needs_work

does not apply

comment:56 Changed 3 years ago by git

  • Commit changed from ff49fdf26d4510af8d18b658a97c805e177006a5 to 707b5198255b05a67faeb209764b65ed0a1292db

Branch pushed to git repo; I updated commit sha1. New commits:

0feb74dMerge branch 'develop' into public/refactor_integer_vectors-12453
707b519Merge branch 'develop' into public/refactor_integer_vectors-12453

comment:57 Changed 3 years ago by tscrim

  • Status changed from needs_work to needs_review

I would appreciate it if you could finish the review Frédéric.

comment:58 Changed 3 years ago by git

  • Commit changed from 707b5198255b05a67faeb209764b65ed0a1292db to 2c221cfebb99a7b3255c61bde71f652e49ed5aba

Branch pushed to git repo; I updated commit sha1. New commits:

2c221cfFixing trivial doctest.

comment:59 Changed 3 years ago by chapoton

typo "yeilds Python"

I am not sure I am able to do a review. This is rather large a ticket..

Would you please provide some timings ? I am still a lot worried about that.

And explain again what is gained, maybe at the loss of some speed ?

comment:60 Changed 3 years ago by chapoton

before

sage: %timeit list(IntegerVectors(5,22))
1 loop, best of 3: 193 ms per loop

after

sage: %timeit list(IntegerVectors(5,22)) 
1 loop, best of 3: 622 ms per loop

so this is not so bad, but still a 3 times slowdown. What do we gain for this prize ?

comment:61 Changed 3 years ago by git

  • Commit changed from 2c221cfebb99a7b3255c61bde71f652e49ed5aba to 0b7514784e9576de0db70f69df099138ebdffa5b

Branch pushed to git repo; I updated commit sha1. New commits:

0b75147Fixing typo yeilds -> yields.

comment:62 Changed 3 years ago by tscrim

That is around what I am getting as well. The advantage is that this is now a proper parent class with elements and fits into the rest of Sage. If you want to have the speed but not the parent/element overhead, I provided an iterator for that:

sage: %timeit list(IntegerVectors(5,22))
1 loop, best of 3: 238 ms per loop
sage: from sage.combinat.integer_vector import integer_vectors_nk_fast_iter
sage: %timeit list(integer_vectors_nk_fast_iter(5,22))
10 loops, best of 3: 63.7 ms per loop

Old:

sage: %timeit list(IntegerVectors(5,22))
10 loops, best of 3: 76.4 ms per loop

Although if you use int and Cython, you can get a very fast iterator. Although converting those back to Integer's was actually slower than working directly with Integer's.

I now see the problem with why that was taking so long. The class I'm using actually stores the input data as int's, not Integer's. So I think I need to change the base class for the elements.

comment:63 Changed 3 years ago by git

  • Commit changed from 0b7514784e9576de0db70f69df099138ebdffa5b to fa63e8f867108571b8306dbdfaaa4d35fdffde9a

Branch pushed to git repo; I updated commit sha1. New commits:

fa63e8fUsing ClonableArray instead of ClonableIntArray.

comment:64 Changed 3 years ago by tscrim

Okay, using ClonableArray makes things a little bit better:

%timeit list(IntegerVectors(5,22))
1 loop, best of 3: 209 ms per loop

comment:65 Changed 3 years ago by git

  • Commit changed from fa63e8f867108571b8306dbdfaaa4d35fdffde9a to 5553a1bd7f7f29215071bff42e500c3fbd63dfe4

Branch pushed to git repo; I updated commit sha1. New commits:

5553a1bSqueezing some more speed out of the iterator.

comment:66 Changed 3 years ago by git

  • Commit changed from 5553a1bd7f7f29215071bff42e500c3fbd63dfe4 to c37d077a26d9bc5893d3366a23df887530e2fc0a

Branch pushed to git repo; I updated commit sha1. New commits:

c37d077Make sure every element in the list is a Sage Integer.

comment:67 Changed 3 years ago by tscrim

By not popping and pushing the list (the iterator wants a fixed length) and some reworking, I got it to:

sage: %timeit list(IntegerVectors(5,22))
1 loop, best of 3: 188 ms per loop

Although I did get some slowdown because rem was not an Integer for the entire loop. So I think this is probably as fast as I can get it.

comment:68 Changed 3 years ago by git

  • Commit changed from c37d077a26d9bc5893d3366a23df887530e2fc0a to 4aaba4280940c4defae572c3e16fb213406bc613

Branch pushed to git repo; I updated commit sha1. New commits:

4aaba42trac 12453 reviewer's commit

comment:69 Changed 3 years ago by chapoton

Looks better now indeed. I have made a reviewer's commit (cosmetic).

Once the bot is green, you can set a positive review.

comment:70 Changed 3 years ago by tscrim

  • Reviewers set to Frédéric Chapoton, Nathann Cohen

Bot is essentially green, so positive review. Thank you.

comment:71 Changed 3 years ago by tscrim

  • Status changed from needs_review to positive_review

comment:72 Changed 3 years ago by vbraun

  • Dependencies #10193 #14101 deleted

FYI I'm only merging tickets whose dependencies are merged git branches.

comment:73 Changed 3 years ago by vbraun

  • Branch changed from public/refactor_integer_vectors-12453 to 4aaba4280940c4defae572c3e16fb213406bc613
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.