Opened 8 years ago
Closed 8 years ago
#17196 closed enhancement (fixed)
Relax assumptions on bitset operations
Reported by:  Jeroen Demeyer  Owned by:  

Priority:  major  Milestone:  sage6.4 
Component:  misc  Keywords:  
Cc:  Nathann Cohen  Merged in:  
Authors:  Jeroen Demeyer, Simon King  Reviewers:  Simon King 
Report Upstream:  N/A  Work issues:  
Branch:  23762a3 (Commits, GitHub, GitLab)  Commit:  23762a31e383d765cbfe6d5bf958d5d592e1513d 
Dependencies:  Stopgaps: 
Description (last modified by )
Most bitset operations state something like
We assume the two sets have the same size. Otherwise, the behavior of this function is undefined and the function may segfault.
However, in fact most of these requirements can be relaxed.
In many cases, I simply change the comment to indicate what the real conditions are. For shifts, I actually change the implementation slightly such that the condition becomes r.size <= a.size + n
which is useful for #15820.
Also move bitsets to src/sage/data_structures
.
Change History (33)
comment:1 Changed 8 years ago by
Cc:  Nathann Cohen added 

comment:2 Changed 8 years ago by
Description:  modified (diff) 

Summary:  Clarify assumptions on bitset operations → Relax assumptions on bitset operations 
comment:3 Changed 8 years ago by
Branch:  → u/jdemeyer/ticket/17196 

Created:  Oct 22, 2014, 1:12:30 PM → Oct 22, 2014, 1:12:30 PM 
Modified:  Oct 22, 2014, 2:17:27 PM → Oct 22, 2014, 2:17:27 PM 
comment:4 Changed 8 years ago by
Commit:  → 83f8a5648815d7c25eb0c9f9b4bdaf04dd90335e 

comment:5 Changed 8 years ago by
Something I'd like to see is the empty bitset. The fact that the size of a bitset must be positive caused me some headache at #15820, where I need bounded integer sequences of length zero. And I think it is not a good solution to fakeuse bitsets of size 1 if the actual size is zero.
comment:6 Changed 8 years ago by
Besides, while we are at it: Shouldn't it be moved to sage.data_structures
?
comment:7 Changed 8 years ago by
For the record: I am trying the move to sage.data_structures right now (hope we are not having a race condition in our commits...)
comment:8 Changed 8 years ago by
Branch:  u/jdemeyer/ticket/17196 → public/17196/relaxassumptionsonbitsetoperations 

comment:9 Changed 8 years ago by
Commit:  83f8a5648815d7c25eb0c9f9b4bdaf04dd90335e → 3ae75a1f8fcfee0f78faf9bb4bb006149b535c45 

The commit that I pushed moves bitsets to the new data_structures folder. I am running tests now. So far it works well...
New commits:
3ae75a1  Move bitset to sage.data_structures

comment:10 Changed 8 years ago by
What I try to do next: Empty bitsets.
Question: Should one perhaps create different functions bitset_init_copy(res,src)
vs. bitset_realloc_copy(res,src)
, where in the first it is assumed that res
is not initialised, whereas the second function will override previous contents of res
? And similar for other functions such as shift?
comment:11 followup: 29 Changed 8 years ago by
How clever are compilers?
The question is related to idioms like n % GMP_LIMB_BITS
.
GMP_LIMB_BITS
is a power of two, and it is known at compile time. Hence, n % GMP_LIMB_BITS
does not involve a generally very expensive %
operation, but boils down to a simple &
.
Will compilers automatically use this optimization? If not, then I think we should introduce an integer constant mod_mp_bits_per_limb
(see #15820), so that n % GMP_LIMB_BITS == n & mp_bits_per_limb
. If yes, then I guess one can also avoid n >> index_shift
, and use n//GMP_LIMB_BITS
instead.
comment:12 followup: 14 Changed 8 years ago by
One place or another (i.e., here or #15820), we will have to deal with the corner case of size zero. The reason: "A common requirement for all functions is that each source area needs at least one limb. No size argument may be zero." See GMP low level functions. I suggest we deal with it here.
comment:13 Changed 8 years ago by
It seems that compilers are clever. On my 32bit laptop:
sage: cython(""" ....: cpdef unsigned int f1(unsigned int size): ....: return (size 1)/(8*sizeof(unsigned int)) +1 ....: cpdef unsigned int f2(unsigned int size): ....: return ((size 1)>>5)+1 """) sage: f1(89) 3L sage: f2(89) 3L sage: %timeit f1(89) 1000000 loops, best of 3: 376 ns per loop sage: %timeit f2(89) 1000000 loops, best of 3: 371 ns per loop sage: %timeit f1(89) 1000000 loops, best of 3: 377 ns per loop sage: %timeit f2(89) 1000000 loops, best of 3: 375 ns per loop sage: %timeit f1(189) 1000000 loops, best of 3: 377 ns per loop sage: %timeit f2(189) 1000000 loops, best of 3: 382 ns per loop
sage: cython(""" ....: cdef unsigned int bpl = 32 ....: cdef unsigned int modbpl = 31 ....: cpdef unsigned int f1(unsigned int size): ....: return size%bpl ....: cpdef unsigned int f2(unsigned int size): ....: return size&modbpl ....: """) sage: f1(89) 25L sage: f2(89) 25L sage: %timeit f1(89) 1000000 loops, best of 3: 380 ns per loop sage: %timeit f2(89) 1000000 loops, best of 3: 375 ns per loop sage: %timeit f1(189) 1000000 loops, best of 3: 382 ns per loop sage: %timeit f2(189) 1000000 loops, best of 3: 375 ns per loop
So, not really a difference...
comment:14 followup: 15 Changed 8 years ago by
Replying to SimonKing:
I suggest we deal with it here.
I would actually suggest the opposite. Keep assuming that bitsets have size > 0 and implement empty biseqs using bitsets of size 1. I think this will be the simplest.
Why do you think this is a bad idea?
comment:15 Changed 8 years ago by
Replying to jdemeyer:
Replying to SimonKing:
I suggest we deal with it here.
I would actually suggest the opposite. Keep assuming that bitsets have size > 0 and implement empty biseqs using bitsets of size 1. I think this will be the simplest.
Why do you think this is a bad idea?
Since I suppose people would think that the empty set should be available as a bitset.
comment:16 Changed 8 years ago by
Authors:  Jeroen Demeyer → Jeroen Demeyer, Simon King 

Status:  new → needs_review 
Anyway, if you think that there should be no empty bitset and if you agree that it should be moved to sage.data_structures.bitset, then it needs review, since all tests pass and the doc correctly appears in sage.data_structures.
comment:17 Changed 8 years ago by
Description:  modified (diff) 

comment:18 Changed 8 years ago by
Description:  modified (diff) 

comment:19 followup: 20 Changed 8 years ago by
About moving that to structure... If you want to move bitsets from misc to structure and so make this 'structure' folder mean "data structure", could you also move the other data structure files from misc there ? I looked at the list and it seems that only binary_tree.*
and binary_matrix.*
are there.
Nathann
comment:20 Changed 8 years ago by
Replying to ncohen:
About moving that to structure... If you want to move bitsets from misc to structure and so make this 'structure' folder mean "data structure", could you also move the other data structure files from misc there ?
It's not structure
, it's data_structures
. And moving other things should be not on this ticket.
comment:21 followup: 22 Changed 8 years ago by
Oh.
So we will have two folders named structure/
, data_structure/
, and data structure code in both data_structure/
and misc/
.
Cool.
Nathan
comment:22 followups: 25 26 Changed 8 years ago by
Replying to ncohen:
data structure code in both
data_structure/
andmisc/
.
and in structure/
and in matroids/
and in combinat/
and probably other places...
comment:23 Changed 8 years ago by
Branch:  public/17196/relaxassumptionsonbitsetoperations → u/jdemeyer/ticket/17196 

Modified:  Oct 23, 2014, 9:13:19 AM → Oct 23, 2014, 9:13:19 AM 
comment:24 Changed 8 years ago by
Commit:  3ae75a1f8fcfee0f78faf9bb4bb006149b535c45 → 23762a31e383d765cbfe6d5bf958d5d592e1513d 

New commits:
23762a3  Import data_structures into global namespace

comment:25 followup: 27 Changed 8 years ago by
Replying to jdemeyer:
Replying to ncohen:
data structure code in both
data_structure/
andmisc/
.and in
structure/
and inmatroids/
and incombinat/
and probably other places...
Nathann, if I understand correctly, the plan is to eventually move
 all sagespecific data structures (e.g._{ those that only make sense within Sage's coercion framework) to structure, }
 all data structures that make sense without Sage and are not tied to a specific mathematical topic (bitsets, bitmatrices, bounded integer sequences, the dictionaries that are currently defined in
sage.structure.coerce_dict
) todata_structures
and then misc
should be reserved to miscellaneous mathematical topics (I think this is the definition according to the docs).
comment:26 Changed 8 years ago by
Replying to jdemeyer:
Replying to ncohen:
data structure code in both
data_structure/
andmisc/
.and in
structure/
and inmatroids/
and incombinat/
and probably other places...
PS: Of course there will be data structure code in many places, for example in sage.matrices
. This is because such data structures are designed for a specific mathematical purpose (e.g., implementing matrix algebras).
comment:27 followup: 28 Changed 8 years ago by
Yo !
Nathann, if I understand correctly, the plan is to eventually move
 all sagespecific data structures (e.g._{ those that only make sense within Sage's coercion framework) to structure, }
 all data structures that make sense without Sage and are not tied to a specific mathematical topic (bitsets, bitmatrices, bounded integer sequences, the dictionaries that are currently defined in
sage.structure.coerce_dict
) todata_structures
and then
misc
should be reserved to miscellaneous mathematical topics (I think this is the definition according to the docs).
I understand. I merely mentionned that by only moving bitsets here the code would be in a weird state where only bitsets are in the new folder while others are still scattered.
It also feels weird to have a directory named "structure" and one named "data structures". Do you have any idea of how "structure" could be renamed ?
Nathann
comment:28 followup: 30 Changed 8 years ago by
Replying to ncohen:
It also feels weird to have a directory named "structure" and one named "data structures". Do you have any idea of how "structure" could be renamed ?
I wouldn't rename structure
because it will cause too much trouble. It contains very fundamental modules, with lots of code depending on it. Fixing the Sage library itself is not so hard, but the annoying part is all the tickets which currently sit on Trac and import stuff from structure
.
comment:29 Changed 8 years ago by
Replying to SimonKing:
How clever are compilers?
The question is related to idioms like
n % GMP_LIMB_BITS
.
GMP_LIMB_BITS
is a power of two, and it is known at compile time. Hence,n % GMP_LIMB_BITS
does not involve a generally very expensive%
operation, but boils down to a simple&
.Will compilers automatically use this optimization?
Yes, they should. It's an obvious easy optimization.
comment:30 Changed 8 years ago by
It also feels weird to have a directory named "structure" and one named "data structures". Do you have any idea of how "structure" could be renamed ?
I wouldn't rename
structure
because it will cause too much trouble. It contains very fundamental modules, with lots of code depending on it. Fixing the Sage library itself is not so hard, but the annoying part is all the tickets which currently sit on Trac and import stuff fromstructure
.
Come on, since when do we have to be compatible with the patches in needs_review
/needs_work
?
Nathann
comment:31 Changed 8 years ago by
Reviewers:  → Simon King 

For the record: I like the changes introduced by Jeroen, and it seems to work well in #15820. Hence, I give a positive review to the part of the code that I didn't author.
comment:32 Changed 8 years ago by
Status:  needs_review → positive_review 

comment:33 Changed 8 years ago by
Branch:  u/jdemeyer/ticket/17196 → 23762a31e383d765cbfe6d5bf958d5d592e1513d 

Resolution:  → fixed 
Status:  positive_review → closed 
Is it ready for a review?
New commits:
Relax assumptions for bitset functions