Ticket #12029 (closed enhancement: fixed)

Opened 19 months ago

Last modified 18 months ago

Fast conversion of ClonableIntArray to list

Reported by: SimonKing Owned by: sage-combinat
Priority: major Milestone: sage-4.8
Component: combinatorics Keywords:
Cc: sage-combinat, nborie Work issues:
Report Upstream: N/A Reviewers: Florent Hivert, Simon King
Authors: Simon King, Florent Hivert Merged in: sage-4.8.alpha2
Dependencies: Stopgaps:

Description (last modified by SimonKing) (diff)

I think the following is too slow:

sage: from sage.structure.list_clone import IncreasingIntArrays
sage: I = IncreasingIntArrays()(range(1000))
sage: timeit("L = list(I)", number=10000)
10000 loops, best of 3: 41.8 µs per loop

My patch adds a method .list() (I hope this is the fastest way of converting a C-int array into a Python list - Cython experts are welcome to find something better), and it adds an __iter__() method that relies on the .list() method.

Note that I tried to have an __iter__ method that does not call list() but works on the C-array (this is now possible with the new Cython version), but it turned out to be not faster.

Here are the timings with my patch

sage: from sage.structure.list_clone import IncreasingIntArrays
sage: I = IncreasingIntArrays()(range(1000))
sage: timeit("L = I.list()", number=10000)
10000 loops, best of 3: 19.4 µs per loop
sage: timeit("L = list(I)", number=10000)
10000 loops, best of 3: 32.9 µs per loop

Apply

trac12029_clonable_int_array_2_list.patch Download

Attachments

list_speedup.patch Download (1.2 KB) - added by hivert 19 months ago.
trac12029_clonable_int_array_2_list.patch Download (2.2 KB) - added by SimonKing 19 months ago.
Iteration over ClonableIntArray?, including Florent's speed-up

Change History

comment:1 follow-up: ↓ 2 Changed 19 months ago by hivert

Hi Simon,

A few remarks:

  • The .list() method is not a Python convention, it is a Sage-Combinat (and maybe Sage) convention. Isn't it ?
  • Since the new Cython allows it. Did you try to define an iterator (using yield) without relying on the list ? It could be faster.
  • Also I'm not sure using append to build the list is a good idea (due to possible reallocation). Si you know the list length from the beginning, why not allocation the list and filling it with a loop ? I'm not sure if it is doable optimally in Cython without using the C-API function PyList_New and the macro PyList_SETITEM.

comment:2 in reply to: ↑ 1 ; follow-up: ↓ 3 Changed 19 months ago by SimonKing

  • Status changed from new to needs_review

Replying to hivert:

  • The .list() method is not a Python convention, it is a Sage-Combinat (and maybe Sage) convention. Isn't it ?

Yes. But there are different places in Sage where .list() is used, and I think if there is a faster way to return a list than via an iterator then it should at least be offered.

  • Since the new Cython allows it. Did you try to define an iterator (using yield) without relying on the list ? It could be faster.

As I stated in the ticket description: I did try, and it was not faster. To be concrete: Here is the code that I tested.

    def __iter__(self):
        """
        Iterate over the items of self.
        ...
        """
        cdef int i
        for i from 0<=i<self._len:
            yield self._list[i]

And here is the benchmark:

sage: from sage.structure.list_clone import IncreasingIntArrays
sage: I = IncreasingIntArrays()(range(1000))
sage: timeit("L = list(I)", number=10000)
10000 loops, best of 3: 34.3 µs per loop

  • Also I'm not sure using append to build the list is a good idea (due to possible reallocation).

Appending to a list - if the list is cdefined - is a very quick operation in Cython.

Si you know the list length from the beginning, why not allocation the list and filling it with a loop ?

Can one allocate a Python list without filling it?

comment:3 in reply to: ↑ 2 ; follow-up: ↓ 4 Changed 19 months ago by hivert

Replying to SimonKing:

  • Since the new Cython allows it. Did you try to define an iterator (using yield) without relying on the list ? It could be faster.

As I stated in the ticket description: I did try, and it was not faster. To be concrete: Here is the code that I tested.

Sorry I read it to fast !!!

  • Also I'm not sure using append to build the list is a good idea (due to possible reallocation).

Appending to a list - if the list is cdefined - is a very quick operation in Cython.

Si you know the list length from the beginning, why not allocation the list and filling it with a loop ?

Can one allocate a Python list without filling it?

From the C-API you can do it. Of course you must fill it (at least with None(s)) before returning it to python. See  http://docs.python.org/c-api/list.html and in particular the note after PyList_New. I don't know if it is doable directly in Cython without calling the C-API.

Florent

comment:4 in reply to: ↑ 3 ; follow-up: ↓ 5 Changed 19 months ago by hivert

Replying to hivert:

Can one allocate a Python list without filling it?

From the C-API you can do it. Of course you must fill it (at least with None(s)) before returning it to python. See  http://docs.python.org/c-api/list.html and in particular the note after PyList_New. I don't know if it is doable directly in Cython without calling the C-API.

Hi Simon !

Sorry for replying to myself but I should have said from the very beginning that I'm not a Cython expert ! I'm currently trying the PyList_New version.

Florent

comment:5 in reply to: ↑ 4 ; follow-up: ↓ 6 Changed 19 months ago by SimonKing

Replying to hivert:

I'm currently trying the PyList_New version.

So did I, but currently I am getting segfaults. I guess that is because the references produced by PyList_SetItem are borrowed.

Best regards,

Simon

comment:6 in reply to: ↑ 5 Changed 19 months ago by hivert

Replying to SimonKing:

Replying to hivert:

I'm currently trying the PyList_New version.

So did I, but currently I am getting segfaults. I guess that is because the references produced by PyList_SetItem are borrowed.

Same problem here !!! :-)

Florent

comment:7 Changed 19 months ago by hivert

Hi Simon,

If correct with respect to memory management, I uploaded a patch which when applied over yours gains some more speed:

This is before my patch:

sage: from sage.structure.list_clone import IncreasingIntArrays
sage: I = IncreasingIntArrays()(range(1000))
sage: timeit("L = I.list()", number=10000)
10000 loops, best of 3: 18.1 µs per loop

This is after my patch:

sage: from sage.structure.list_clone import IncreasingIntArrays
sage: I = IncreasingIntArrays()(range(1000))
sage: timeit("L = I.list()", number=10000)
10000 loops, best of 3: 14 µs per loop

So this is more 20 percents faster... After having a Cython expert check the reference counting stuff, this is probably worth including.

What do you think ?

Cheers,

Florent

comment:8 follow-up: ↓ 9 Changed 19 months ago by SimonKing

While you were posting here, I was asking on  sage-devel, posting a code that was very similar to yours. Only difference: I did not use o = PyInt_FromLong(self._list[i]).

Interesting that this is so fast. I need to catch some sleep now, however...

comment:9 in reply to: ↑ 8 ; follow-ups: ↓ 10 ↓ 11 Changed 19 months ago by hivert

Replying to SimonKing:

While you were posting here, I was asking on  sage-devel, posting a code that was very similar to yours. Only difference: I did not use o = PyInt_FromLong(self._list[i]).

I just got confirmation (on cython-users) that INCREF is correct where I put it.

Interesting that this is so fast. I need to catch some sleep now, however...

Me too ! Sorry for not being around Paris just when you come.

Cheers,

Florent

comment:10 in reply to: ↑ 9 Changed 19 months ago by SimonKing

Replying to hivert:

I just got confirmation (on cython-users) that INCREF is correct where I put it.

I was putting the INCREF in precisely the same location as you did. Only difference to the code that I tried (getting segfaults): You did PyInt_FromLong before that. And I don't see why that conversion prevents the segfault.

Cheers,

Simon

comment:11 in reply to: ↑ 9 Changed 19 months ago by hivert

Interesting that this is so fast. I need to catch some sleep now, however...

Me too ! Sorry for not being around Paris just when you come.

Just before I go to bed, Mark Florisson noticed on cython-users that

return [self._list[i] for i in range(self._len)]

Should be as fast as the former append version.

Cheers,

Florent

Changed 19 months ago by hivert

comment:12 Changed 19 months ago by hivert

Hi Simon,

Looking at the C code I figured out that replacing

cdef list L = PyList_New(self._len)

by

cdef list L = <list> PyList_New(self._len)

allows to gain a few more speed:

sage: %timeit I.list()
625 loops, best of 3: 13 µs per loop

So one less microsecond. Can you fold my patch and review it. I'm Ok with your changes.

Cheers,

Florent

Changed 19 months ago by SimonKing

Iteration over ClonableIntArray?, including Florent's speed-up

comment:13 follow-up: ↓ 14 Changed 19 months ago by SimonKing

  • Status changed from needs_review to positive_review
  • Reviewers set to Florent Hivert, Simon King
  • Description modified (diff)
  • Authors changed from Simon King to Simon King, Florent Hivert

I folded Florent's patch and posted the combined patch under the original name of my patch. I hope you don't mind.

I can confirm that using the C-Api gives an additional speed-up. The new doctest from Florent's patch tests against a case that was critical during development (it would result in a segfault without Florent's PyInt_FromLong).

All doctests pass, even when using the new Cython version. Therefore, I can give Florent's patch a positive review. He gave my patch a positive review as well. So, that's done.

Apply trac12029_clonable_int_array_2_list.patch

comment:14 in reply to: ↑ 13 Changed 19 months ago by hivert

Dear Simon,

All doctests pass, even when using the new Cython version. Therefore, I can give Florent's patch a positive review. He gave my patch a positive review as well. So, that's done.

Excellent ! I'm glad to have you as one of my patch co-author, if this wasn't already the case. Thanks for the review.

For the record  This thread on cython-users indicate that the choosen solution seems to be the correct one. They are starting discussing optimizing list comprehension in Cython to handle our case.

Cheers,

Florent

comment:15 Changed 18 months ago by jdemeyer

  • Status changed from positive_review to closed
  • Resolution set to fixed
  • Merged in set to sage-4.8.alpha2
Note: See TracTickets for help on using tickets.