Opened 7 years ago

Closed 6 years ago

Last modified 5 years ago

#13352 closed enhancement (fixed)

Running time improvement of the bitset_len method

Reported by: dcoudert Owned by: jason
Priority: major Milestone: sage-5.13
Component: misc Keywords:
Cc: ncohen, leif Merged in: sage-5.13.rc0
Authors: David Coudert, Jeroen Demeyer Reviewers: David Coudert
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by dcoudert)

This patch improves the running time of the bitset_len method by using mpn_popcount().

Before:

sage: B = Bitset('10'*10000)
sage: len(B)
10000
sage: %timeit len(B)
1000 loops, best of 3: 245 us per loop

After:

sage: B = Bitset('10'*10000)
sage: len(B)
10000
sage: %timeit len(B)
100000 loops, best of 3: 2.52 us per loop

This patch also reorganizes the bitset files, including deleting the sage/misc/bitset_pxd.pxi file.

APPLY:

Attachments (6)

trac_13352-bitcount-in-c.patch (2.0 KB) - added by dcoudert 7 years ago.
trials not working
trac_13352_bitset_len.patch (10.8 KB) - added by dcoudert 7 years ago.
trac_13352_bitset_len_v2.patch (1.2 KB) - added by dcoudert 6 years ago.
trac_13352_bitset_len_v3.patch (4.9 KB) - added by dcoudert 6 years ago.
a.c (1.1 KB) - added by ncohen 6 years ago.
trac_13352_bitset_len_v4.patch (56.5 KB) - added by jdemeyer 6 years ago.

Download all attachments as: .zip

Change History (57)

comment:1 Changed 7 years ago by dcoudert

  • Status changed from new to needs_review

comment:2 follow-up: Changed 7 years ago by leif

I'd rather "implement" the functions in C (where you can use the C preprocessor and __builtin_popcount() if available) and write a thin Cython wrapper.

GCC e.g. chooses efficient machine implementations (probably a single instruction), depending on the (configured or specified) target and perhaps also further compiler flags.

Note that GMP has functions like mpz_popcount() for mpz_ts as well.


Not sure what the parallel refers to ... ;-)

Changed 7 years ago by dcoudert

trials not working

comment:3 in reply to: ↑ 2 ; follow-up: Changed 7 years ago by dcoudert

  • Description modified (diff)

Replying to leif:

I'd rather "implement" the functions in C (where you can use the C preprocessor and __builtin_popcount() if available) and write a thin Cython wrapper.

GCC e.g. chooses efficient machine implementations (probably a single instruction), depending on the (configured or specified) target and perhaps also further compiler flags.

As reported in https://groups.google.com/forum/#!topic/sage-devel/GnudTUwbGG4, I tried to implement the function in C with __builtin_popcount(). I was able to compile successfully, but then sage was not working anymore. See attached patch trac_13352-bitcount-in-c.patch.

Note that GMP has functions like mpz_popcount() for mpz_ts as well.

It could be another solution, but I have not been able yet to find how to access the functions used for int behind?

Not sure what the parallel refers to ... ;-)

In fact, the 4 lines n = __COUNT32__(n, 0) can be executed in parallel and gcc is sometimes able to optimize the sequence of instructions. For instance, all methods have similar running times on my macbook air, but parallel methods are slower on my old PC (system still in 32-bits...).

Last edited 7 years ago by dcoudert (previous) (diff)

comment:4 Changed 7 years ago by dcoudert

  • Description modified (diff)

comment:5 in reply to: ↑ 3 Changed 7 years ago by leif

Replying to dcoudert:

Replying to leif:

Not sure what the parallel refers to ... ;-)

In fact, the 4 lines n = __COUNT32__(n, 0) can be executed in parallel

Well, obviously not, since you have the data dependence on n. ;-)


Btw., you should use UINT32_C(1) instead of 0x1u etc., and e.g. UINT32_MAX instead of (<uint32_t>-1) (Cython code) etc.

The types of the shift parameters (of __TWO??__()) are slightly overdim'ed...

comment:6 Changed 7 years ago by dcoudert

I'm able to import UINT32_MAX, but not UINT32_C.

  • with from libc.stdint cimport UINT32_C: it is apparently not imported
  • with from libc.stdint import UINT32_C: it compiles, but sage crashes with weird messages

Any special trick?

Changed 7 years ago by dcoudert

comment:7 Changed 6 years ago by jdemeyer

  • Milestone changed from sage-5.11 to sage-5.12

Changed 6 years ago by dcoudert

comment:8 Changed 6 years ago by dcoudert

  • Cc ncohen added
  • Description modified (diff)

Hello,

Since I have now understood how to use __builtin_popcountl (not obvious when you don't know), I have uploaded a new version of the patch.

before:

sage: B = Bitset('00'*10000+'1')
sage: len(B)
1
sage: %timeit len(B)
1000000 loops, best of 3: 268 ns per loop
sage: B = Bitset('10'*10000)
sage: len(B)
10000
sage: %timeit len(B)
1000 loops, best of 3: 245 us per loop
sage: B = Bitset('10'*1000000)
sage: len(B)
1000000
sage: %timeit len(B)
10 loops, best of 3: 24.1 ms per loop

after:

sage: B = Bitset('00'*10000+'1')
sage: len(B)
1
sage: %timeit len(B)
1000000 loops, best of 3: 275 ns per loop
sage: B = Bitset('10'*10000)
sage: len(B)
10000
sage: %timeit len(B)
100000 loops, best of 3: 2.52 us per loop
sage: B = Bitset('10'*1000000)
sage: len(B)
1000000
sage: %timeit len(B)
1000 loops, best of 3: 236 us per loop

I'm wondering if I should add a wrapper in case the __builtin_popcountl method is not available. This is not difficult, but I don't know which environnement variables should be tested.

Thanks.

comment:9 Changed 6 years ago by dcoudert

  • Cc leif added
  • Description modified (diff)

I have now implemented the function in C in file popcount.h with a small wrapper. The result is a bit slower than without wrapper, but still way faster than previous method.

sage: B = Bitset('00'*10000+'1')
sage: len(B)
1
sage: %timeit len(B)
1000000 loops, best of 3: 361 ns per loop
sage: B = Bitset('10'*10000)
sage: %timeit len(B)
100000 loops, best of 3: 3.03 us per loop
sage: B = Bitset('10'*1000000)
sage: len(B)
1000000
sage: %timeit len(B)
1000 loops, best of 3: 287 us per loop

I have used the _WIN32 flag, but I'm not sure it's the good one. So if one knows the best flag to use to detect when the __builtin_popcount method is not available, please let me know.

comment:10 Changed 6 years ago by dcoudert

Hello,

after browsing the web, I came up with this new version. The current version tests is the cpu has builtin support. If yes, we use builtin version. Else, we use hakmem methods.

We don't need all the methods for bitset_len but it could be useful for other usage.

Let me know if it is OK.

comment:11 Changed 6 years ago by ncohen

HMmmmmmmmm.... I'm trying to understand the code and it's much more complicated than what I actually read. Funny, looks like you can have as much fun with GCC than with Python, and decide how stuff is to be run at run time instead of compile time :-P

This being said, if I understand well what "ifunc" does I do not see how the popcount instruction will be used when available. To me it looks like you should make sure that the file is compiled with the -mpopcnt GCC flag (so that __builtin_popcnt is always replaced by the popcount instruction even if it does not exist), and that this ifunc trick means to avoid using this instruction if it is detected that, at runtime, it is not available.

This being said, it is a very nice trick O_o

Nathann

comment:12 Changed 6 years ago by ncohen

I would also say that those 2 blocks

__attribute__ ((__target__ ("popcnt"))) 
 static unsigned int builtin_popcount(unsigned int w){ 
 return (sizeof(unsigned int)==4? __builtin_popcount(w) : __builtin_popcountl(w)); 
 } 

are mistakes. Whatever happens, regardless of the architecture, __builtin_popcount works on int type, and __builtin_popcountl works on long types.

Also, perhaps you should credit Cristian Rodríguez somewhere :-P

Nathann

Changed 6 years ago by dcoudert

comment:13 Changed 6 years ago by dcoudert

Right. I have simplified the code and add credits to Cristian Rodriguez.

comment:14 Changed 6 years ago by ncohen

Here's a slightly cleaner version which does not use ifunc but only G++'s multiversionning. C's looking very much like Python these days :-PPPP

Nice links :

Nathann

Changed 6 years ago by ncohen

comment:15 Changed 6 years ago by dcoudert

Unfortunately, I'm unable to use this for the patch because the "default" keyword is not recognized.

comment:16 Changed 6 years ago by ncohen

Hmmmmmm... Well, it seems that this feature is only available in gcc 4.8+. I asked Volker for his advice on this ticket (or rather on the .c file I uploaded), and he told me that we should check for both gcc 4.8+ and x86 (ifdef __i386__).

Besides, he saw on that link (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36041, though I can't find out where on the page) that GCC was about to patch that, and improve the generic version of __builtin_gcc. If that's the case, perhaps we should just use theirs, waiting for the patch to be merged with a future gcc release. Otherwise we can use the MIT HAKMEM version indeed.

I'll try to send a version with both codes today. And we'll remove the old one in a couple of years :-P

Nathann

comment:17 Changed 6 years ago by ncohen

Hey I don't get it... In your patch you also use a attribute/target with popcount. You can't compile my code yet you can compile yours ? And that means that they added this attribute/target feature before adding the "default" keyword ? O_o

Nathann

comment:18 Changed 6 years ago by dcoudert

Apparently yes. Have you tried to modify the patch and compile on your own computer?

comment:19 Changed 6 years ago by ncohen

I'm trying, but it takes forever to compile. What happens if you replace "default" with "" in the .c file ?

Nathann

comment:20 Changed 6 years ago by ncohen

Arg. This "" fails :-/

Nathann

comment:21 Changed 6 years ago by ncohen

HMmmmmmm O_o

What I get when running tests is :

Traceback (most recent call last):
  File "/home/ncohen/.Sage/local/bin/sage-runtests", line 79, in <module>
    from sage.doctest.control import DocTestController
  File "/home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/doctest/control.py", line 27, in <module>
    from sources import FileDocTestSource, DictAsObject
  File "/home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/doctest/sources.py", line 27, in <module>
    from util import NestedName
  File "/home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/doctest/util.py", line 23, in <module>
    from sage.misc.misc import walltime, cputime
  File "/home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/misc/misc.py", line 565, in <module>
    from sage.misc.misc_c import prod, running_total, balanced_sum, is_64_bit, is_32_bit
ImportError: /home/ncohen/.Sage/local/lib/python2.7/site-packages/sage/misc/misc_c.so: undefined symbol: __builtin_popcount

Nathann

comment:22 Changed 6 years ago by ncohen

(that was with your patch applied)

comment:23 Changed 6 years ago by dcoudert

so you mean with patch http://trac.sagemath.org/raw-attachment/ticket/13352/trac_13352_bitset_len_v3.patch as it is or a modified version ?

When I try to replace my patch with what you have put in a.c file, I get:

In file included from sage/coding/binary_code.c:319:0:
./sage/misc/popcount.h:107:3: error: attribute(target("default")) is unknown
./sage/misc/popcount.h:123:3: error: attribute(target("default")) is unknown
./sage/misc/popcount.h:128:14: error: redefinition of ‘popcount’
./sage/misc/popcount.h:123:14: note: previous definition of ‘popcount’ was here
./sage/misc/popcount.h:132:14: error: redefinition of ‘popcountl’
./sage/misc/popcount.h:107:14: note: previous definition of ‘popcountl’ was here
error: command 'gcc' failed with exit status 1
Error installing modified sage library code.

comment:24 Changed 6 years ago by ncohen

I meant your patch, without modifications. As for the errors you get, I think it's because the file gets compiled with gcc instead of g++. Volker said that it should be sufficient to make it a .cc file, and that GCC would take the hint and use the right compiler instead. It does work when outside of Sage :

~$ cp a.cc a.c
~$ gcc a.c -o a
a.c:31:12: error: redefinition of ‘popcount’
 inline int popcount(unsigned int i){
            ^
a.c:26:12: note: previous definition of ‘popcount’ was here
 inline int popcount(unsigned int i){
            ^
a.c:35:12: error: redefinition of ‘popcountl’
 inline int popcountl(unsigned long i){
            ^
a.c:10:12: note: previous definition of ‘popcountl’ was here
 inline int popcountl(unsigned long i){
            ^
~$ gcc a.cc -o a
~$ 

Though there does not seem to be anything wrong with "default" on my computer. Which version of gcc are you using ? It's probably < 4.8.

Nathann

comment:25 Changed 6 years ago by dcoudert

gcc (GCC) 4.6.3 20120306.

comment:26 Changed 6 years ago by ncohen

Helloooooooo ! Just to mention that I have been fighting with multiversionning in Sage+GCC for two days now, and that I can't seem to make it work. It works outside of Sage, it segfaults inside of Sage and I have no idea why.

Oh, and there is a #DEFINE NO_INLINE among the default preprocessor variables for anything that gets compiled by Sage it seems.

Nathann

comment:27 Changed 6 years ago by dcoudert

So unless someone has a good solution, we can make a patch containing only the hakmem methods, and if one is later able to make __builtin_popcount running, then we may switch to it. Do you agree?

comment:28 Changed 6 years ago by ncohen

Yep, it makes sense but it is a pity :-/

I wrote to sage-devel, just in case ...

https://groups.google.com/d/topic/sage-devel/FIxzZv0L_lo/discussion

Nathann

comment:29 Changed 6 years ago by jdemeyer

I think a great suggestion implied by #14704 is to use mpn_popcount() for this. That's essentially one line of code to call that function :-)

comment:30 Changed 6 years ago by jdemeyer

I'm working on a patch which does this...

comment:31 follow-up: Changed 6 years ago by dcoudert

That would be great if you have the good trick to use that method. I tried some time ago to use mpz_popcount and failed.

comment:32 Changed 6 years ago by jdemeyer

I'm also merging bitset_pxd.pxi and bitset.pxd (I see no reason why these would be two separate files) and moving the doctests to bitset.pyx.

Last edited 6 years ago by jdemeyer (previous) (diff)

comment:33 follow-up: Changed 6 years ago by ncohen

Hmmmmmmmmmm O_O

You are on a dangerous road, Jeroen. The road to many broken dependencies in the graph/ folder :-PPPPPP

Nathann

comment:34 in reply to: ↑ 33 Changed 6 years ago by jdemeyer

Replying to ncohen:

Hmmmmmmmmmm O_O

You are on a dangerous road, Jeroen. The road to many broken dependencies in the graph/ folder :-PPPPPP

You mean because of bitset_pxd.pxi? I could simply leave that file with a one-liner

from sage.misc.bitset cimport *

if you prefer.

Last edited 6 years ago by jdemeyer (previous) (diff)

comment:35 Changed 6 years ago by jdemeyer

  • Authors changed from David Coudert to David Coudert, Jeroen Demeyer
  • Description modified (diff)

comment:36 in reply to: ↑ 31 Changed 6 years ago by jdemeyer

Replying to dcoudert:

That would be great if you have the good trick to use that method. I tried some time ago to use mpz_popcount and failed.

The trick is to use mpn_popcount instead of mpz_popcount.

comment:37 Changed 6 years ago by jdemeyer

Speed is much better too.

With trac_13352_bitset_len_v3.patch:

sage: B = Bitset('10'*10000)
sage: timeit('len(B)',repeat=100,number=10000)
10000 loops, best of 100: 1.24 µs per loop
sage: B = Bitset('00'*10000)
sage: timeit('len(B)',repeat=100,number=10000)
10000 loops, best of 100: 254 ns per loop

With trac_13352_bitset_len_v4.patch:

sage: B = Bitset('10'*10000)
sage: timeit('len(B)',repeat=100,number=10000)
10000 loops, best of 100: 168 ns per loop
sage: B = Bitset('00'*10000)
sage:  timeit('len(B)',repeat=100,number=10000)
10000 loops, best of 100: 168 ns per loop
Last edited 6 years ago by jdemeyer (previous) (diff)

comment:38 Changed 6 years ago by jdemeyer

Similar improvements could be made to other bitset functions (in particular to bitset_next() and the like). But that would be for a new ticket...

comment:39 Changed 6 years ago by jdemeyer

Strangely, many places were importing bitset without actually using bitsets... so I removed those redundant imports.

Even stranger were two files sage/combinat/debruijn_sequence.pxd and sage/matroids/unpickling.pxd containing just one line

include 'sage/misc/bitset_pxd.pxi' 

which was in both cases unused.

Changed 6 years ago by jdemeyer

comment:40 follow-up: Changed 6 years ago by dcoudert

Excellent! The patch is working perfectly (install, doctests on sage/ , etc.). The simplification of the hamming weight method is also very good.

Thank you very much.

Should we let Nathann conclude on the status of this ticket?

comment:41 Changed 6 years ago by jdemeyer

See also #10093 for another bitset-related ticket ready for review.

comment:42 in reply to: ↑ 40 Changed 6 years ago by ncohen

Should we let Nathann conclude on the status of this ticket?

No sorry, please deal with this without me, I'm already juggling with 1000 patches right now O_o

Nathann

comment:43 Changed 6 years ago by dcoudert

  • Status changed from needs_review to positive_review

For me the patch proposed by Jeroen is OK (and I can install #10093 on top of it), so I set its status to positive.

comment:44 Changed 6 years ago by jdemeyer

  • Reviewers set to David Coudert

comment:45 Changed 6 years ago by jdemeyer

This is ridiculous and shows how fast mpn_popcount() is:

sage: B = FrozenBitset(capacity=10^5)
sage: timeit('B.isempty()', repeat=100, number=10000)
10000 loops, best of 100: 980 ns per loop
sage: timeit('len(B)', repeat=100, number=10000)
10000 loops, best of 100: 573 ns per loop

comment:46 follow-up: Changed 6 years ago by ncohen

Hmmmmmm... Would you happen to have the popcount instruction on the computer on which you try this ?

Nathann

comment:47 in reply to: ↑ 46 Changed 6 years ago by jdemeyer

Replying to ncohen:

Hmmmmmm... Would you happen to have the popcount instruction on the computer on which you try this ?

I think so, yes.

comment:48 Changed 6 years ago by jdemeyer

  • Merged in set to sage-5.13.rc0
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:49 Changed 6 years ago by jason

  • Description modified (diff)

comment:50 Changed 6 years ago by dcoudert

  • Description modified (diff)

I changed the description since we use mpn_popcount and not ___builtin_popcount

comment:51 Changed 5 years ago by jdemeyer

Follow-up: #16937

Note: See TracTickets for help on using tickets.