Opened 5 years ago

Closed 5 years ago

#17196 closed enhancement (fixed)

Relax assumptions on bitset operations

Reported by: jdemeyer Owned by:
Priority: major Milestone: sage-6.4
Component: misc Keywords:
Cc: ncohen Merged in:
Authors: Jeroen Demeyer, Simon King Reviewers: Simon King
Report Upstream: N/A Work issues:
Branch: 23762a3 (Commits) Commit: 23762a31e383d765cbfe6d5bf958d5d592e1513d
Dependencies: Stopgaps:

Description (last modified by jdemeyer)

Most bitset operations state something like

    We assume the two sets have the same size.  Otherwise, the
    behavior of this function is undefined and the function may
    segfault.

However, in fact most of these requirements can be relaxed.

In many cases, I simply change the comment to indicate what the real conditions are. For shifts, I actually change the implementation slightly such that the condition becomes r.size <= a.size + n which is useful for #15820.

Also move bitsets to src/sage/data_structures.

Change History (33)

comment:1 Changed 5 years ago by ncohen

  • Cc ncohen added

comment:2 Changed 5 years ago by jdemeyer

  • Description modified (diff)
  • Summary changed from Clarify assumptions on bitset operations to Relax assumptions on bitset operations

comment:3 Changed 5 years ago by jdemeyer

  • Branch set to u/jdemeyer/ticket/17196
  • Created changed from 10/22/14 13:12:30 to 10/22/14 13:12:30
  • Modified changed from 10/22/14 14:17:27 to 10/22/14 14:17:27

comment:4 Changed 5 years ago by SimonKing

  • Commit set to 83f8a5648815d7c25eb0c9f9b4bdaf04dd90335e

Is it ready for a review?


New commits:

83f8a56Relax assumptions for bitset functions

comment:5 Changed 5 years ago by SimonKing

Something I'd like to see is the empty bitset. The fact that the size of a bitset must be positive caused me some headache at #15820, where I need bounded integer sequences of length zero. And I think it is not a good solution to fake-use bitsets of size 1 if the actual size is zero.

comment:6 Changed 5 years ago by SimonKing

Besides, while we are at it: Shouldn't it be moved to sage.data_structures?

comment:7 Changed 5 years ago by SimonKing

For the record: I am trying the move to sage.data_structures right now (hope we are not having a race condition in our commits...)

comment:8 Changed 5 years ago by SimonKing

  • Branch changed from u/jdemeyer/ticket/17196 to public/17196/relax-assumptions-on-bitset-operations

comment:9 Changed 5 years ago by SimonKing

  • Commit changed from 83f8a5648815d7c25eb0c9f9b4bdaf04dd90335e to 3ae75a1f8fcfee0f78faf9bb4bb006149b535c45

The commit that I pushed moves bitsets to the new data_structures folder. I am running tests now. So far it works well...


New commits:

3ae75a1Move bitset to sage.data_structures

comment:10 Changed 5 years ago by SimonKing

What I try to do next: Empty bitsets.

Question: Should one perhaps create different functions bitset_init_copy(res,src) vs. bitset_realloc_copy(res,src), where in the first it is assumed that res is not initialised, whereas the second function will override previous contents of res? And similar for other functions such as shift?

comment:11 follow-up: Changed 5 years ago by SimonKing

How clever are compilers?

The question is related to idioms like n % GMP_LIMB_BITS.

GMP_LIMB_BITS is a power of two, and it is known at compile time. Hence, n % GMP_LIMB_BITS does not involve a generally very expensive %-operation, but boils down to a simple &.

Will compilers automatically use this optimization? If not, then I think we should introduce an integer constant mod_mp_bits_per_limb (see #15820), so that n % GMP_LIMB_BITS == n & mp_bits_per_limb. If yes, then I guess one can also avoid n >> index_shift, and use n//GMP_LIMB_BITS instead.

comment:12 follow-up: Changed 5 years ago by SimonKing

One place or another (i.e., here or #15820), we will have to deal with the corner case of size zero. The reason: "A common requirement for all functions is that each source area needs at least one limb. No size argument may be zero." See GMP low level functions. I suggest we deal with it here.

comment:13 Changed 5 years ago by SimonKing

It seems that compilers are clever. On my 32-bit laptop:

sage: cython("""
....: cpdef unsigned int f1(unsigned int size):
....:     return (size -1)/(8*sizeof(unsigned int)) +1
....: cpdef unsigned int f2(unsigned int size):
....:     return ((size -1)>>5)+1
""")
sage: f1(89)
3L
sage: f2(89)
3L
sage: %timeit f1(89)
1000000 loops, best of 3: 376 ns per loop
sage: %timeit f2(89)
1000000 loops, best of 3: 371 ns per loop
sage: %timeit f1(89)
1000000 loops, best of 3: 377 ns per loop
sage: %timeit f2(89)
1000000 loops, best of 3: 375 ns per loop
sage: %timeit f1(189)
1000000 loops, best of 3: 377 ns per loop
sage: %timeit f2(189)
1000000 loops, best of 3: 382 ns per loop
sage: cython("""
....: cdef unsigned int bpl = 32
....: cdef unsigned int modbpl = 31
....: cpdef unsigned int f1(unsigned int size):
....:     return size%bpl
....: cpdef unsigned int f2(unsigned int size):
....:     return size&modbpl
....: """)
sage: f1(89)
25L
sage: f2(89)
25L
sage: %timeit f1(89)
1000000 loops, best of 3: 380 ns per loop
sage: %timeit f2(89)
1000000 loops, best of 3: 375 ns per loop
sage: %timeit f1(189)
1000000 loops, best of 3: 382 ns per loop
sage: %timeit f2(189)
1000000 loops, best of 3: 375 ns per loop

So, not really a difference...

comment:14 in reply to: ↑ 12 ; follow-up: Changed 5 years ago by jdemeyer

Replying to SimonKing:

I suggest we deal with it here.

I would actually suggest the opposite. Keep assuming that bitsets have size > 0 and implement empty biseqs using bitsets of size 1. I think this will be the simplest.

Why do you think this is a bad idea?

Last edited 5 years ago by jdemeyer (previous) (diff)

comment:15 in reply to: ↑ 14 Changed 5 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

I suggest we deal with it here.

I would actually suggest the opposite. Keep assuming that bitsets have size > 0 and implement empty biseqs using bitsets of size 1. I think this will be the simplest.

Why do you think this is a bad idea?

Since I suppose people would think that the empty set should be available as a bitset.

comment:16 Changed 5 years ago by SimonKing

  • Authors changed from Jeroen Demeyer to Jeroen Demeyer, Simon King
  • Status changed from new to needs_review

Anyway, if you think that there should be no empty bitset and if you agree that it should be moved to sage.data_structures.bitset, then it needs review, since all tests pass and the doc correctly appears in sage.data_structures.

comment:17 Changed 5 years ago by jdemeyer

  • Description modified (diff)

comment:18 Changed 5 years ago by jdemeyer

  • Description modified (diff)

comment:19 follow-up: Changed 5 years ago by ncohen

About moving that to structure... If you want to move bitsets from misc to structure and so make this 'structure' folder mean "data structure", could you also move the other data structure files from misc there ? I looked at the list and it seems that only binary_tree.* and binary_matrix.* are there.

Nathann

comment:20 in reply to: ↑ 19 Changed 5 years ago by jdemeyer

Replying to ncohen:

About moving that to structure... If you want to move bitsets from misc to structure and so make this 'structure' folder mean "data structure", could you also move the other data structure files from misc there ?

It's not structure, it's data_structures. And moving other things should be not on this ticket.

comment:21 follow-up: Changed 5 years ago by ncohen

Oh.

So we will have two folders named structure/, data_structure/, and data structure code in both data_structure/ and misc/.

Cool.

Nathan

comment:22 in reply to: ↑ 21 ; follow-ups: Changed 5 years ago by jdemeyer

Replying to ncohen:

data structure code in both data_structure/ and misc/.

and in structure/ and in matroids/ and in combinat/ and probably other places...

comment:23 Changed 5 years ago by jdemeyer

  • Branch changed from public/17196/relax-assumptions-on-bitset-operations to u/jdemeyer/ticket/17196
  • Modified changed from 10/23/14 09:13:19 to 10/23/14 09:13:19

comment:24 Changed 5 years ago by jdemeyer

  • Commit changed from 3ae75a1f8fcfee0f78faf9bb4bb006149b535c45 to 23762a31e383d765cbfe6d5bf958d5d592e1513d

New commits:

23762a3Import data_structures into global namespace

comment:25 in reply to: ↑ 22 ; follow-up: Changed 5 years ago by SimonKing

Replying to jdemeyer:

Replying to ncohen:

data structure code in both data_structure/ and misc/.

and in structure/ and in matroids/ and in combinat/ and probably other places...

Nathann, if I understand correctly, the plan is to eventually move

  • all sage-specific data structures (e.g. those that only make sense within Sage's coercion framework) to structure,
  • all data structures that make sense without Sage and are not tied to a specific mathematical topic (bitsets, bitmatrices, bounded integer sequences, the dictionaries that are currently defined in sage.structure.coerce_dict) to data_structures

and then misc should be reserved to miscellaneous mathematical topics (I think this is the definition according to the docs).

comment:26 in reply to: ↑ 22 Changed 5 years ago by SimonKing

Replying to jdemeyer:

Replying to ncohen:

data structure code in both data_structure/ and misc/.

and in structure/ and in matroids/ and in combinat/ and probably other places...

PS: Of course there will be data structure code in many places, for example in sage.matrices. This is because such data structures are designed for a specific mathematical purpose (e.g., implementing matrix algebras).

comment:27 in reply to: ↑ 25 ; follow-up: Changed 5 years ago by ncohen

Yo !

Nathann, if I understand correctly, the plan is to eventually move

  • all sage-specific data structures (e.g. those that only make sense within Sage's coercion framework) to structure,
  • all data structures that make sense without Sage and are not tied to a specific mathematical topic (bitsets, bitmatrices, bounded integer sequences, the dictionaries that are currently defined in sage.structure.coerce_dict) to data_structures

and then misc should be reserved to miscellaneous mathematical topics (I think this is the definition according to the docs).

I understand. I merely mentionned that by only moving bitsets here the code would be in a weird state where only bitsets are in the new folder while others are still scattered.

It also feels weird to have a directory named "structure" and one named "data structures". Do you have any idea of how "structure" could be renamed ?

Nathann

comment:28 in reply to: ↑ 27 ; follow-up: Changed 5 years ago by jdemeyer

Replying to ncohen:

It also feels weird to have a directory named "structure" and one named "data structures". Do you have any idea of how "structure" could be renamed ?

I wouldn't rename structure because it will cause too much trouble. It contains very fundamental modules, with lots of code depending on it. Fixing the Sage library itself is not so hard, but the annoying part is all the tickets which currently sit on Trac and import stuff from structure.

comment:29 in reply to: ↑ 11 Changed 5 years ago by jdemeyer

Replying to SimonKing:

How clever are compilers?

The question is related to idioms like n % GMP_LIMB_BITS.

GMP_LIMB_BITS is a power of two, and it is known at compile time. Hence, n % GMP_LIMB_BITS does not involve a generally very expensive %-operation, but boils down to a simple &.

Will compilers automatically use this optimization?

Yes, they should. It's an obvious easy optimization.

comment:30 in reply to: ↑ 28 Changed 5 years ago by ncohen

It also feels weird to have a directory named "structure" and one named "data structures". Do you have any idea of how "structure" could be renamed ?

I wouldn't rename structure because it will cause too much trouble. It contains very fundamental modules, with lots of code depending on it. Fixing the Sage library itself is not so hard, but the annoying part is all the tickets which currently sit on Trac and import stuff from structure.

Come on, since when do we have to be compatible with the patches in needs_review/needs_work ?

Nathann

comment:31 Changed 5 years ago by SimonKing

  • Reviewers set to Simon King

For the record: I like the changes introduced by Jeroen, and it seems to work well in #15820. Hence, I give a positive review to the part of the code that I didn't author.

comment:32 Changed 5 years ago by jdemeyer

  • Status changed from needs_review to positive_review

comment:33 Changed 5 years ago by vbraun

  • Branch changed from u/jdemeyer/ticket/17196 to 23762a31e383d765cbfe6d5bf958d5d592e1513d
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.