#15820 closed enhancement (fixed)
Implement sequences of bounded integers
Reported by: | SimonKing | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-6.4 |
Component: | algebra | Keywords: | sequence bounded integer |
Cc: | JStarx, jhpalmieri, was, stumpc5, saliola, SimonKing, gmoose05, nthiery, virmaux, ncohen, hivert | Merged in: | |
Authors: | Simon King, Jeroen Demeyer | Reviewers: | Jeroen Demeyer, Simon King |
Report Upstream: | N/A | Work issues: | |
Branch: | 4e9e1c5 (Commits, GitHub, GitLab) | Commit: | |
Dependencies: | #17195, #17196 | Stopgaps: |
Description (last modified by )
#12630 has introduced tools to compute with quiver algebras and quiver representations. It is all Python code, paths being internally stored by lists of labelled edges of digraphs.
The aim of this ticket is to provide an efficient implementation of lists of bounded integers, with concatenation. This can be used to implement paths of a quiver more efficiently, but can also have different applications (say, elements of free groups, or other applications in sage-combinat).
Attachments (1)
Change History (385)
comment:1 Changed 7 years ago by
comment:2 follow-up: ↓ 3 Changed 7 years ago by
I'd say:
- Some implementation without limitations (the one from the previous ticket I guess)
- Whichever of 1-4 that covers your needs (256 edges for Gröbner bases calculations this should be already quite big, right?)
- As much as possible of the code shared between all implementations to make it easy to add new ones if deemed useful.
But that's just my 2 cents ... I don't have specific experience!
Cheers,
comment:3 in reply to: ↑ 2 Changed 7 years ago by
Hi Nicolas,
Replying to nthiery:
- Some implementation without limitations (the one from the previous ticket I guess)
An implementation which represents edges by unsigned long long
can be considered "unlimited".
- Whichever of 1-4 that covers your needs (256 edges for Gröbner bases calculations this should be already quite big, right?)
Yes, but I want a general code.
- As much as possible of the code shared between all implementations to make it easy to add new ones if deemed useful.
Actually, on the level of paths, the implementations come in pairs: The compressed representations differ from the non-compressed---at the end of the day, we just have different ways of representing sequences of integers. Only when it comes to interpreting a sequence of integers as a path in a quiver will we have further differences: Does "[1,2]" mean "start with arrow 1 of the quiver and continue with arrow 2 of the quiver", or does it mean "travel along the outgoing arrow 1 of the vertex we are currently located at, and then continue with outgoing arrow 2 of the vertex we just arrived at".
Currently, I prefer a compressed representation by unsigned long long*
. Rationale:
- This will work even for extremely large quivers---the only restriction is, that the number of edges can be represented as an
unsigned long long
(a larger quiver won't fit into computer's memory anyway). - For small quivers, even paths of length 10 will easily fit into a single
unsigned long long
. That's quite efficient, both memory- and speed-wise.
A bit later I will post experimental code (not yet talking about quiver paths but about sequences of integers) giving some evidence on efficiency. And then we can still see how we interpret the integer sequences: Having a global or a local list of arrows?
comment:4 Changed 7 years ago by
I have attached a toy (or proof-of-concept) implementation; see path_test.pyx. In a real implementation in
compressed representation, it would be the job of a PathSemigroup
to find
out how many edges fit into one word. And of course, in the toy implementation
I am not trying to translate an integer sequence into the string
representation of an element of the path semigroup.
What implementation would you suggest to use in "reality"? Is there another implementation of "sequences of bounded integers" that you would recommend? And would you enumerate the arrows of a quiver locally (i.e., only number the outgoing edges at each vertex) or globally? The former will result in better compression, but would make it more difficult to test for paths p,q whether there are paths r,s with p==r*q*s
.
Now for the timings, and sorry for the long post...
Here, I am timing concatenation, hash, comparison, comparing initial segments. In all examples, we use the following paths of various sizes.
p5 = Path(ZZ, 5, range(5), 1, 1) p25 = Path(ZZ, 25, range(25), 1, 1) p50 = Path(ZZ, 50, range(50), 1, 1) p100 = Path(ZZ, 100, range(100), 1, 1) p1000 = Path(ZZ, 1000, range(1000), 1, 1) q5 = Path(ZZ, 5, range(1,6), 1, 1) q25 = Path(ZZ, 25, range(1,26), 1, 1) q50 = Path(ZZ, 50, range(1,51), 1, 1) q100 = Path(ZZ, 100, range(1,101), 1, 1) q1000 = Path(ZZ, 1000, range(1,1001), 1, 1) r5 = Path(ZZ, 5, range(1,5)+[0], 1, 1) r25 = Path(ZZ, 25, range(1,25)+[0], 1, 1) r50 = Path(ZZ, 50, range(1,50)+[0], 1, 1) r100 = Path(ZZ, 100, range(1,100)+[0], 1, 1) r1000 = Path(ZZ, 1000, range(1,1000)+[0], 1, 1) s5 = Path(ZZ, 5, range(5), 1, 1) s25 = Path(ZZ, 25, range(25), 1, 1) s50 = Path(ZZ, 50, range(50), 1, 1) s100 = Path(ZZ, 100, range(100), 1, 1) s1000 = Path(ZZ, 1000, range(1000), 1, 1)
We are doing the following tests.
Concatenation:
%timeit p5*q5 %timeit p5*q25 %timeit p25*q25 %timeit p25*q50 %timeit p50*q50 %timeit p50*q100 %timeit p100*q100 %timeit p100*q1000 %timeit p1000*q1000
Hash:
%timeit hash(p5) %timeit hash(p25) %timeit hash(p50) %timeit hash(p100) %timeit hash(p1000)
Comparison:
- equal
%timeit p5==s5 %timeit p25==s25 %timeit p50==s50 %timeit p100==s100 %timeit p1000==s1000
- obviously different (first item differs)
%timeit p5==r5 %timeit p25==r25 %timeit p50==r50 %timeit p100==r100 %timeit p1000==r1000
- less obviously different (only the last item differs)
%timeit q5==r5 %timeit q25==r25 %timeit q50==r50 %timeit q100==r100 %timeit q1000==r1000
Startswith:
%timeit q1000.startswith(q100) # it does start with %timeit q1000.startswith(r100) # it doesn't start with
Uncompressed integer lists
Here, we define Path=Path_v1
(see path_test.pyx).
Using ctypedef unsigned long block
sage: %timeit p5*q5 1000000 loops, best of 3: 739 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 769 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 801 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 810 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 885 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 959 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 1.02 us per loop sage: %timeit p100*q1000 100000 loops, best of 3: 1.99 us per loop sage: %timeit p1000*q1000 100000 loops, best of 3: 3.06 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 135 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 166 ns per loop sage: %timeit hash(p50) 1000000 loops, best of 3: 214 ns per loop sage: %timeit hash(p100) 1000000 loops, best of 3: 333 ns per loop sage: %timeit hash(p1000) 100000 loops, best of 3: 2.09 us per loop sage: %timeit p5==s5 1000000 loops, best of 3: 1.63 us per loop sage: %timeit p25==s25 100000 loops, best of 3: 5.91 us per loop sage: %timeit p50==s50 100000 loops, best of 3: 11 us per loop sage: %timeit p100==s100 10000 loops, best of 3: 23 us per loop sage: %timeit p1000==s1000 1000 loops, best of 3: 212 us per loop sage: %timeit p5==r5 1000000 loops, best of 3: 634 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 659 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 659 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 686 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 698 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 1.65 us per loop sage: %timeit q25==r25 100000 loops, best of 3: 5.83 us per loop sage: %timeit q50==r50 100000 loops, best of 3: 11.5 us per loop sage: %timeit q100==r100 10000 loops, best of 3: 22 us per loop sage: %timeit q1000==r1000 1000 loops, best of 3: 213 us per loop sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 325 ns per loop sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 311 ns per loop
Using ctypedef unsigned short block
sage: %timeit p5*q5 1000000 loops, best of 3: 700 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 738 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 820 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 769 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 808 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 877 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 909 ns per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.49 us per loop sage: %timeit p1000*q1000 100000 loops, best of 3: 2.13 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 138 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 179 ns per loop sage: %timeit hash(p50) 1000000 loops, best of 3: 236 ns per loop sage: %timeit hash(p100) 1000000 loops, best of 3: 383 ns per loop sage: %timeit hash(p1000) 100000 loops, best of 3: 2.61 us per loop sage: %timeit p5==s5 1000000 loops, best of 3: 1.08 us per loop sage: %timeit p25==s25 100000 loops, best of 3: 3.42 us per loop sage: %timeit p50==s50 100000 loops, best of 3: 6.53 us per loop sage: %timeit p100==s100 100000 loops, best of 3: 14.9 us per loop sage: %timeit p1000==s1000 10000 loops, best of 3: 137 us per loop sage: %timeit p5==r5 1000000 loops, best of 3: 613 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 591 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 602 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 603 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 635 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 1.15 us per loop sage: %timeit q25==r25 100000 loops, best of 3: 3.63 us per loop sage: %timeit q50==r50 100000 loops, best of 3: 6.45 us per loop sage: %timeit q100==r100 100000 loops, best of 3: 12.8 us per loop sage: %timeit q1000==r1000 10000 loops, best of 3: 140 us per loop sage: q1000.startswith(q100) True sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 336 ns per loop sage: q1000.startswith(r100) False sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 325 ns per loop
Using ctypedef unsigned char block
sage: %timeit p5*q5 1000000 loops, best of 3: 679 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 693 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 725 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 760 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 772 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 761 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 790 ns per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.1 us per loop sage: %timeit p1000*q1000 1000000 loops, best of 3: 1.31 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 133 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 176 ns per loop sage: %timeit hash(p50) 1000000 loops, best of 3: 234 ns per loop sage: %timeit hash(p100) 1000000 loops, best of 3: 374 ns per loop sage: %timeit hash(p1000) 100000 loops, best of 3: 2.53 us per loop sage: %timeit p5==s5 1000000 loops, best of 3: 1.08 us per loop sage: %timeit p25==s25 100000 loops, best of 3: 3.36 us per loop sage: %timeit p50==s50 100000 loops, best of 3: 6.3 us per loop sage: %timeit p100==s100 100000 loops, best of 3: 12.5 us per loop sage: %timeit p1000==s1000 10000 loops, best of 3: 121 us per loop sage: %timeit p5==r5 1000000 loops, best of 3: 581 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 589 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 599 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 599 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 636 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 1.06 us per loop sage: %timeit q25==r25 100000 loops, best of 3: 3.46 us per loop sage: %timeit q50==r50 100000 loops, best of 3: 6.51 us per loop sage: %timeit q100==r100 100000 loops, best of 3: 12.7 us per loop sage: %timeit q1000==r1000 10000 loops, best of 3: 121 us per loop sage: q1000.startswith(q100) True sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 324 ns per loop sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 315 ns per loop
Compressed integer lists
Here, we define Path=Path_v2
(see path_test.pyx), and do
sage: SizeManager.set_edge_number(27)
so that there is a total of only 27 edges (resp. 27 is the maximal number of
outgoing arrows on a vertex). In particular, all integers in the sequences
below are taken "mod 27". Since 27 has length 5 bits, a compressed
representation using unsigned char*
won't make sense. Hence, I am only
testing long and short.
Using ctypedef unsigned long block
sage: %timeit p5*q5 1000000 loops, best of 3: 676 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 671 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 683 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 725 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 759 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 787 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 811 ns per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.44 us per loop sage: %timeit p1000*q1000 1000000 loops, best of 3: 1.61 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 132 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 134 ns per loop sage: %timeit hash(p50) 10000000 loops, best of 3: 142 ns per loop sage: %timeit hash(p100) 10000000 loops, best of 3: 155 ns per loop sage: %timeit hash(p1000) 1000000 loops, best of 3: 466 ns per loop sage: %timeit p5==s5 1000000 loops, best of 3: 710 ns per loop sage: %timeit p25==s25 1000000 loops, best of 3: 1.71 us per loop sage: %timeit p50==s50 100000 loops, best of 3: 2.6 us per loop sage: %timeit p100==s100 100000 loops, best of 3: 4.4 us per loop sage: %timeit p1000==s1000 10000 loops, best of 3: 38 us per loop sage: %timeit p5==r5 1000000 loops, best of 3: 692 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 715 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 724 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 729 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 752 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 706 ns per loop sage: %timeit q25==r25 1000000 loops, best of 3: 1.69 us per loop sage: %timeit q50==r50 100000 loops, best of 3: 2.59 us per loop sage: %timeit q100==r100 100000 loops, best of 3: 4.4 us per loop sage: %timeit q1000==r1000 10000 loops, best of 3: 37.6 us per loop sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 272 ns per loop sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 257 ns per loop
Using ctypedef unsigned short block
sage: %timeit p5*q5 1000000 loops, best of 3: 718 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 682 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 710 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 740 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 722 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 781 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 858 ns per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.98 us per loop sage: %timeit p1000*q1000 100000 loops, best of 3: 2.17 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 132 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 148 ns per loop sage: %timeit hash(p50) 10000000 loops, best of 3: 168 ns per loop sage: %timeit hash(p100) 1000000 loops, best of 3: 206 ns per loop sage: %timeit hash(p1000) 1000000 loops, best of 3: 951 ns per loop sage: %timeit p5==s5 1000000 loops, best of 3: 723 ns per loop sage: %timeit p25==s25 1000000 loops, best of 3: 1.77 us per loop sage: %timeit p50==s50 100000 loops, best of 3: 2.93 us per loop sage: %timeit p100==s100 100000 loops, best of 3: 5.32 us per loop sage: %timeit p1000==s1000 10000 loops, best of 3: 47.5 us per loop sage: %timeit p5==r5 1000000 loops, best of 3: 577 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 586 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 599 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 588 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 607 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 754 ns per loop sage: %timeit q25==r25 1000000 loops, best of 3: 1.87 us per loop sage: %timeit q50==r50 100000 loops, best of 3: 2.89 us per loop sage: %timeit q100==r100 100000 loops, best of 3: 5.28 us per loop sage: %timeit q1000==r1000 10000 loops, best of 3: 47.3 us per loop sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 272 ns per loop sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 256 ns per loop
comment:5 Changed 7 years ago by
It turns out that there are bugs in the toy code. But that's no surprise...
comment:6 Changed 7 years ago by
I have fixed the problems in path_test.pyx.
It was:
- bitshift by a wrong offset, and
- the bits that aren't used (e.g., the last 4 bits of an uint64_t when storing 12 arrows of size 5 bits) should be kept blank.
The latter involves an additional bitwise "and". So, probably the timings will slightly deteriorate.
comment:7 Changed 7 years ago by
There is one other variation that I want to test.
Currently, if the number of edges fits into 5 bits, then a 64 bit word is filled with data of 12 edges, plus 4 bits of garbage. The garbage must always be zero bits, for otherwise the shift operations would pollute real data with the garbage bits. To keep the garbage zero involves additional operations.
Alternatively, one could try to put fewer arrows into one word, so that there are no garbage bits. This could be done by encoding each arrow by a number of bits that is a power of 2. Hence, in the setting above, one would encode each edge by 8 rather than 5 bits, fitting 8 arrows into one 64 bit word. It is less dense, however the code gets simpler and should have less overhead.
comment:8 Changed 7 years ago by
I have replaced path_test.pyx again, adding two more implementations.
Do you have further suggestions on storage of lists of bounded integers? If you haven't, then I'll try to assess the timings stated in the various comments, according to what I expect to be useful in my applications.
Here are details on the two new implementations:
In Path_v3
, I use Sage's Integer
class as bit list. Sounds crazy, but it works rather well. I assume that bitshifts and comparison (and that's what we are using here) are internally written in assembler, and for us, the additional benefit is that we do not need to take care about the size of blocks (8 bit? 32 bit? 64 bit? 128 bit?) in which we store stuff---Integer
does it for us. Perhaps the timings could be improved further by using the underlying c-library (GMP's mpz_t
) directly.
In Path_v4
, I provide a slightly simplified version of Path_v2
: The data is compressed, but so that the chunks used to store one arrow fit seamlessly into a memory block. Hence, when we use 64-bit blocks, and have 27 arrows, then Path_v2
would store the arrows in chunks of 5 bit (hence, it can store 12 arrows in one memory block, leaving 4 bit empty), whereas Path_v4
would store the arrows in chunks of 8 bit (hence, it can only store 8 arrows in one memory block, but since this fits neatly into one memory block, the code can be simplified).
The timings for Path_v3
:
sage: %timeit p5*q5 1000000 loops, best of 3: 645 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 631 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 635 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 1.23 us per loop sage: %timeit p50*q50 1000000 loops, best of 3: 1.82 us per loop sage: %timeit p50*q100 1000000 loops, best of 3: 1.83 us per loop sage: %timeit p100*q100 1000000 loops, best of 3: 1.37 us per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.82 us per loop sage: %timeit p1000*q1000 1000000 loops, best of 3: 1.78 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 164 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 197 ns per loop sage: %timeit hash(p50) 1000000 loops, best of 3: 210 ns per loop sage: %timeit hash(p100) 1000000 loops, best of 3: 274 ns per loop sage: %timeit hash(p1000) 1000000 loops, best of 3: 1.94 us per loop sage: %timeit p5==s5 1000000 loops, best of 3: 667 ns per loop sage: %timeit p25==s25 1000000 loops, best of 3: 638 ns per loop sage: %timeit p50==s50 1000000 loops, best of 3: 693 ns per loop sage: %timeit p100==s100 1000000 loops, best of 3: 658 ns per loop sage: %timeit p1000==s1000 1000000 loops, best of 3: 817 ns per loop sage: %timeit p5==r5 1000000 loops, best of 3: 746 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 723 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 769 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 731 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 733 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 710 ns per loop sage: %timeit q25==r25 1000000 loops, best of 3: 733 ns per loop sage: %timeit q50==r50 1000000 loops, best of 3: 718 ns per loop sage: %timeit q100==r100 1000000 loops, best of 3: 719 ns per loop sage: %timeit q1000==r1000 1000000 loops, best of 3: 755 ns per loop sage: %timeit q1000.startswith(q100) 100000 loops, best of 3: 2.46 us per loop sage: %timeit q1000.startswith(r100) 100000 loops, best of 3: 2.45 us per loop
The timings for Path_v4
:
sage: %timeit p5*q5 1000000 loops, best of 3: 730 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 762 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 806 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 849 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 854 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 931 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 930 ns per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.97 us per loop sage: %timeit p1000*q1000 1000000 loops, best of 3: 1.45 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 136 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 148 ns per loop sage: %timeit hash(p50) 10000000 loops, best of 3: 159 ns per loop sage: %timeit hash(p100) 10000000 loops, best of 3: 182 ns per loop sage: %timeit hash(p1000) 1000000 loops, best of 3: 600 ns per loop sage: %timeit p5==s5 1000000 loops, best of 3: 700 ns per loop sage: %timeit p25==s25 1000000 loops, best of 3: 1.65 us per loop sage: %timeit p50==s50 100000 loops, best of 3: 2.48 us per loop sage: %timeit p100==s100 100000 loops, best of 3: 4.21 us per loop sage: %timeit p1000==s1000 10000 loops, best of 3: 36.4 us per loop sage: %timeit p5==r5 100000 loops, best of 3: 706 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 764 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 740 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 742 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 822 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 693 ns per loop sage: %timeit q25==r25 1000000 loops, best of 3: 1.6 us per loop sage: %timeit q50==r50 100000 loops, best of 3: 2.54 us per loop sage: %timeit q100==r100 100000 loops, best of 3: 4.27 us per loop sage: %timeit q1000==r1000 10000 loops, best of 3: 37.1 us per loop sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 255 ns per loop sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 243 ns per loop
comment:9 Changed 7 years ago by
Hmmm. Perhaps I should have a look at sage/misc/bitset.pxi
.
comment:10 Changed 7 years ago by
I couldn't directly use sage/misc/bitset
, but I could learn from it: I have improved the Path_v4
implementation (here, the arrows will be encoded by chunks of memory whose size is a power of 2). This was possible by using memcmp
in some place (not in the method startswith()
, where a loop over the data is faster than memcmp) and, in particular, use bitshift operations to replace multiplications and integer divisions (this is only possible with Path_v4
, since here the factors are powers of 2 and can thus correspond to shifts).
New timings with Path=Path_v4
and SizeManager.set_edge_number(27)
:
sage: %timeit p5*q5 1000000 loops, best of 3: 684 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 726 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 798 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 862 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 827 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 924 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 915 ns per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.91 us per loop sage: %timeit p1000*q1000 1000000 loops, best of 3: 1.42 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 148 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 158 ns per loop sage: %timeit hash(p50) 10000000 loops, best of 3: 171 ns per loop sage: %timeit hash(p100) 10000000 loops, best of 3: 194 ns per loop sage: %timeit hash(p1000) 1000000 loops, best of 3: 594 ns per loop sage: %timeit p5==s5 1000000 loops, best of 3: 498 ns per loop sage: %timeit p25==s25 1000000 loops, best of 3: 547 ns per loop sage: %timeit p50==s50 1000000 loops, best of 3: 631 ns per loop sage: %timeit p100==s100 1000000 loops, best of 3: 712 ns per loop sage: %timeit p1000==s1000 100000 loops, best of 3: 2.42 us per loop sage: %timeit p5==r5 1000000 loops, best of 3: 499 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 514 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 549 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 526 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 545 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 506 ns per loop sage: %timeit q25==r25 1000000 loops, best of 3: 544 ns per loop sage: %timeit q50==r50 1000000 loops, best of 3: 589 ns per loop sage: %timeit q100==r100 1000000 loops, best of 3: 704 ns per loop sage: %timeit q1000==r1000 100000 loops, best of 3: 2.48 us per loop sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 263 ns per loop sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 232 ns per loop
comment:11 Changed 7 years ago by
PS: I chose 27 edges, since this should be particularly bad for Path_v4
. Indeed, in Path_v2
, 12 arrows can be put into one uint64_t
, whereas in Path_v4
only 8 arrows fit into one uint64_t
. So, if Path_v4
turns out to be faster than Path_v2
with 27 edges, then it is likely to be generally faster, I think.
comment:12 Changed 7 years ago by
From my side, this is the last part of the proof-of-concept implementations, before assessing the benchmark tests and choosing an implementation for the elements of path semigroups...
In Path_v5
, I am using the GMP library directly, thus avoiding some overhead compared with Path_v3
. The timings I get (with Path=Path_v5
and SizeManager.set_edge_number(27)
):
sage: %timeit p5*q5 1000000 loops, best of 3: 720 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 814 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 823 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 808 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 828 ns per loop sage: %timeit p50*q100 1000000 loops, best of 3: 890 ns per loop sage: %timeit p100*q100 1000000 loops, best of 3: 866 ns per loop sage: %timeit p100*q1000 1000000 loops, best of 3: 1.18 us per loop sage: %timeit p1000*q1000 1000000 loops, best of 3: 1.38 us per loop sage: %timeit hash(p5) 10000000 loops, best of 3: 159 ns per loop sage: %timeit hash(p25) 10000000 loops, best of 3: 171 ns per loop sage: %timeit hash(p50) 1000000 loops, best of 3: 198 ns per loop sage: %timeit hash(p100) 1000000 loops, best of 3: 266 ns per loop sage: %timeit hash(p1000) 1000000 loops, best of 3: 1.93 us per loop sage: %timeit p5==s5 1000000 loops, best of 3: 466 ns per loop sage: %timeit p25==s25 1000000 loops, best of 3: 509 ns per loop sage: %timeit p50==s50 1000000 loops, best of 3: 495 ns per loop sage: %timeit p100==s100 1000000 loops, best of 3: 485 ns per loop sage: %timeit p1000==s1000 1000000 loops, best of 3: 693 ns per loop sage: %timeit p5==r5 1000000 loops, best of 3: 451 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 468 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 463 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 468 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 482 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 448 ns per loop sage: %timeit q25==r25 1000000 loops, best of 3: 462 ns per loop sage: %timeit q50==r50 1000000 loops, best of 3: 448 ns per loop sage: %timeit q100==r100 1000000 loops, best of 3: 457 ns per loop sage: %timeit q1000==r1000 1000000 loops, best of 3: 470 ns per loop sage: %timeit q1000.startswith(q100) 1000000 loops, best of 3: 673 ns per loop sage: %timeit q1000.startswith(r100) 1000000 loops, best of 3: 657 ns per loop
comment:13 follow-up: ↓ 14 Changed 7 years ago by
Hmmmmmmm... Well, looks like the last one is the best, even though it is a bit slow for hashing. If the sizeof a block is a power of two it should be quite good, don't you think ?
This being said, perhaps it should be implemented like the bitset stuff, in a algebraic-free file, to store sequences of integers as you said.
How do you define your hashing function and your comparison function, by the way ?
Nathann
comment:14 in reply to: ↑ 13 Changed 7 years ago by
Replying to ncohen:
Hmmmmmmm... Well, looks like the last one is the best, even though it is a bit slow for hashing.
What do you mean by "the last one"? Using GMP's mpz_t
directly, which is Path_v5
in the attachment? Or the improved "semicompressed" version of uint64_t*
(storing chunks of size 2^n
rather then in the smallest possible chunk size), which is Path_v4
?
If the sizeof a block is a power of two it should be quite good, don't you think ?
Should be. My own summary of the timings above would be this. We have 6 benchmarks, and in the following list I am giving for each benchmark a small, a medium and a large example. In the end, +
or -
indicate whether the respective implementation did well/poorly in the three examples.
- Concatenation
5x5 50x50 1000x1000 Uncompressed long : 739 ns 885 ns 3.06 us - - Uncompressed short: 700 ns 808 ns 2.13 us Uncompressed char : 679 ns 772 ns 1.31 us + Compressed long : 676 ns 759 ns 1.61 us Compressed short: 718 ns 722 ns 1.98 us + Semicompr. long : 684 ns 827 ns 1.42 us Integer : 645 ns 1.82 us 1.78 us +- GMP mpz_t: 723 ns 799 ns 1.41 us
- Hash
5 100 1000 Uncompressed long : 135 ns 333 ns 2.09 us + Uncompressed short: 138 ns 383 ns 2.61 us +-- Uncompressed char : 133 ns 374 ns 2.53 us + Compressed long : 132 ns 155 ns 466 ns +++ Compressed short: 132 ns 206 ns 951 ns + Semicompr. long : 148 ns 194 ns 594 ns Integer : 164 ns 274 ns 1.94 us - GMP mpz_t: 159 ns 266 ns 1.93 us
- == for equal paths
5 100 1000 Uncompressed long : 1.63 us 23 us 212 us --- Uncompressed short: 1.08 us 14.9 us 137 us Uncompressed char : 1.08 us 12.5 us 121 us Compressed long : 710 ns 4.4 us 38 us Compressed short: 723 ns 5.3 us 47.5 us Semicompr. long : 498 ns 712 ns 2.4 us Integer : 667 ns 658 ns 817 ns GMP mpz_t: 466 ns 485 ns 693 ns +++
- == for "easily" unequal paths
5 100 1000 Uncompressed long : 634 ns 686 ns 698 ns Uncompressed short: 613 ns 603 ns 635 ns Uncompressed char : 581 ns 599 ns 636 ns Compressed long : 692 ns 729 ns 752 ns -- Compressed short: 577 ns 588 ns 607 ns Semicompr. long : 499 ns 526 ns 545 ns Integer : 746 ns 731 ns 733 ns -- GMP mpz_t: 451 ns 468 ns 482 ns +++
- == for "difficult" unequal paths
5 100 1000 Uncompressed long : 1.65 us 22 us 213 us --- Uncompressed short: 1.15 us 12.8 us 140 us Uncompressed char : 1.06 us 12.7 us 121 us Compressed long : 706 ns 4.4 us 37.6 us Compressed short: 754 ns 5.3 us 47.3 us Semicompr. long : 506 ns 704 ns 2.5 us Integer : 710 ns 719 ns 755 ns GMP mpz_t: 448 ns 457 ns 470 ns +++
- startswith (here we only have one example size, one where the answer is True, and one where the answer is False)
yes no Uncompressed long : 325 ns 311 ns Uncompressed short: 336 ns 325 ns Uncompressed char : 324 ns 315 ns Compressed long : 272 ns 257 ns + Compressed short: 272 ns 256 ns + Semicompr. long : 263 ns 232 ns ++ Integer : 2.46 us 2.45 us -- GMP mpz_t: 673 ns 657 ns
So, I don't think there is a clear winner in the competition.
This being said, perhaps it should be implemented like the bitset stuff, in a algebraic-free file, to store sequences of integers as you said.
Yes, it would make sense.
How do you define your hashing function and your comparison function, by the way ?
Depends on the implementation. Of course, when I use Integer
or mpz_t
, I am using the hash function provided by GMP. When I implement a block*
, where block
is either char
, uint32_t
or uint64_t
, I am using some hash function for lists that I found in the internet a while ago (I don't recall where). It is as follows:
def __hash__(self): cdef block h cdef block* ptr h = <block>(self.start^(self.end<<16)^(self._len)) ptr = <block *>self.data cdef size_t i h ^= FNV_offset for i from 0<=i<self._len: h ^= ptr[i] h *= FNV_prime if h==-1: return -2 return h
where FNV_prime
and FNV_offset
depend on the architecture. I am surprised that this is faster than GMP's hash function.
comment:15 Changed 7 years ago by
- Description modified (diff)
- Summary changed from Improve efficiency of the path semigroup to Implement sequences of bounded integers
comment:16 Changed 7 years ago by
- Keywords sequence bounded integer added; quiver path algebra efficiency removed
As Nathann has mentioned, it would make sense to provide a general implementation of sequences of bounded integers (independent of quiver paths), and then quiver paths can inherit from it later. That's why I changed the ticket description.
comment:17 Changed 7 years ago by
- Dependencies #12630 deleted
comment:18 Changed 7 years ago by
For completeness, I have added Path_v6
, which uses Python's tuple
under the hood (which is an obvious way to implement paths, too).
Result:
sage: %timeit p5*q5 1000000 loops, best of 3: 439 ns per loop sage: %timeit p5*q25 1000000 loops, best of 3: 542 ns per loop sage: %timeit p25*q25 1000000 loops, best of 3: 612 ns per loop sage: %timeit p25*q50 1000000 loops, best of 3: 869 ns per loop sage: %timeit p50*q50 1000000 loops, best of 3: 1.1 us per loop sage: %timeit p50*q100 1000000 loops, best of 3: 1.39 us per loop sage: %timeit p100*q100 1000000 loops, best of 3: 1.41 us per loop sage: %timeit p100*q1000 100000 loops, best of 3: 5.04 us per loop sage: %timeit p1000*q1000 100000 loops, best of 3: 11 us per loop sage: %timeit hash(p5) 1000000 loops, best of 3: 245 ns per loop sage: %timeit hash(p25) 1000000 loops, best of 3: 532 ns per loop sage: %timeit hash(p50) 1000000 loops, best of 3: 884 ns per loop sage: %timeit hash(p100) 1000000 loops, best of 3: 1.63 us per loop sage: %timeit hash(p1000) 100000 loops, best of 3: 14.6 us per loop sage: %timeit p5==s5 1000000 loops, best of 3: 885 ns per loop sage: %timeit p25==s25 1000000 loops, best of 3: 1.71 us per loop sage: %timeit p50==s50 100000 loops, best of 3: 2.63 us per loop sage: %timeit p100==s100 100000 loops, best of 3: 4.53 us per loop sage: %timeit p1000==s1000 10000 loops, best of 3: 39.3 us per loop sage: %timeit p5==r5 1000000 loops, best of 3: 785 ns per loop sage: %timeit p25==r25 1000000 loops, best of 3: 829 ns per loop sage: %timeit p50==r50 1000000 loops, best of 3: 799 ns per loop sage: %timeit p100==r100 1000000 loops, best of 3: 791 ns per loop sage: %timeit p1000==r1000 1000000 loops, best of 3: 822 ns per loop sage: %timeit q5==r5 1000000 loops, best of 3: 1.48 us per loop sage: %timeit q25==r25 100000 loops, best of 3: 3.72 us per loop sage: %timeit q50==r50 100000 loops, best of 3: 6.63 us per loop sage: %timeit q100==r100 100000 loops, best of 3: 12.5 us per loop sage: %timeit q1000==r1000 10000 loops, best of 3: 116 us per loop sage: %timeit q1000.startswith(q100) 100000 loops, best of 3: 4.7 us per loop sage: %timeit q1000.startswith(r100) 100000 loops, best of 3: 4.72 us per loop
The comparison with the other implementations becomes this:
- Concatenation
5x5 50x50 1000x1000 Uncompressed long : 739 ns 885 ns 3.06 us - - Uncompressed short: 700 ns 808 ns 2.13 us Uncompressed char : 679 ns 772 ns 1.31 us + Compressed long : 676 ns 759 ns 1.61 us Compressed short: 718 ns 722 ns 1.98 us + Semicompr. long : 684 ns 827 ns 1.42 us Integer : 645 ns 1.82 us 1.78 us - GMP mpz_t: 723 ns 799 ns 1.41 us tuple : 439 ns 1.1 us 11 us + -
- Hash
5 100 1000 Uncompressed long : 135 ns 333 ns 2.09 us + Uncompressed short: 138 ns 383 ns 2.61 us +-- Uncompressed char : 133 ns 374 ns 2.53 us + Compressed long : 132 ns 155 ns 466 ns +++ Compressed short: 132 ns 206 ns 951 ns + Semicompr. long : 148 ns 194 ns 594 ns Integer : 164 ns 274 ns 1.94 us GMP mpz_t: 159 ns 266 ns 1.93 us tuple : 245 ns 1.63 us 14.6 us ---
- == for equal paths
5 100 1000 Uncompressed long : 1.63 us 23 us 212 us --- Uncompressed short: 1.08 us 14.9 us 137 us Uncompressed char : 1.08 us 12.5 us 121 us Compressed long : 710 ns 4.4 us 38 us Compressed short: 723 ns 5.3 us 47.5 us Semicompr. long : 498 ns 712 ns 2.4 us Integer : 667 ns 658 ns 817 ns GMP mpz_t: 466 ns 485 ns 693 ns +++ tuple : 885 ns 4.5 us 39.3 us
- == for "easily" unequal paths
5 100 1000 Uncompressed long : 634 ns 686 ns 698 ns Uncompressed short: 613 ns 603 ns 635 ns Uncompressed char : 581 ns 599 ns 636 ns Compressed long : 692 ns 729 ns 752 ns Compressed short: 577 ns 588 ns 607 ns Semicompr. long : 499 ns 526 ns 545 ns Integer : 746 ns 731 ns 733 ns GMP mpz_t: 451 ns 468 ns 482 ns +++ tuple : 785 ns 791 ns 822 ns ---
- == for "difficult" unequal paths
5 100 1000 Uncompressed long : 1.65 us 22 us 213 us --- Uncompressed short: 1.15 us 12.8 us 140 us Uncompressed char : 1.06 us 12.7 us 121 us Compressed long : 706 ns 4.4 us 37.6 us Compressed short: 754 ns 5.3 us 47.3 us Semicompr. long : 506 ns 704 ns 2.5 us Integer : 710 ns 719 ns 755 ns GMP mpz_t: 448 ns 457 ns 470 ns +++ tuple : 1.48 us 12.5 us 116 us
- startswith
yes no Uncompressed long : 325 ns 311 ns Uncompressed short: 336 ns 325 ns Uncompressed char : 324 ns 315 ns Compressed long : 272 ns 257 ns + Compressed short: 272 ns 256 ns + Semicompr. long : 263 ns 232 ns ++ Integer : 2.46 us 2.45 us GMP mpz_t: 673 ns 657 ns tuple : 4.7 us 4.72 us --
So, using tuples is best for concatenation of short paths, but that's the only case where it is not too slow.
comment:19 follow-up: ↓ 20 Changed 7 years ago by
Well, given the outcomes I guess it mostly depends on your own practical applications ?... :-)
I also wonder (because I hate them) how much time is spent dealing with parents. But all your implementations have to go through that at the moment so it does not change your comparisons.
Well, I am back to work at long long last. With a computer AND a screen. And in Brussels, which counts for the good mood :-P
Have fuuuuuuuuuuuuuun !
Nathann
comment:20 in reply to: ↑ 19 Changed 7 years ago by
Hi Nathann,
Replying to ncohen:
I also wonder (because I hate them) how much time is spent dealing with parents.
Well, at some point parents have to come into play, namely when I want to implement not just sequences of bounded integers, but elements of the path semigroup of a quiver.
The strategical question is: Should the data defining a path be stored in an attribute of that path (the attribute would be an instance of BoundedIntegerSequence
), or should the path itself be an instance of BoundedIntegerSequence
? In the former case, it would be perfectly fine to have sequences of bounded integers parent-less; but I worry about an overhead of accessing the attribute. In the latter case, BoundedIntegerSequence
should already be a sub-class of sage.structure.element.Element
(and have a parent), since Cython does not allow multiple inheritance.
I tend to the latter solution also for a different reason: Instances of BoundedIntegerSequence
have to know the bound for their elements, because the data storage depends on that bound. For efficiency (see attachment), several other constants are derived from that bound, and used in the algorithms.
Should each instance of BoundedIntegerSequence
know about all these bounds? Or should there be a common parent for all sequences of integers that are bounded by a number B?
I tend to the latter. If you assign several ints (the afore-mentioned constants) to an instance of BoundedIntegerSequence
, it will likely be more time consuming (during initialisation) than to store just a single data (namely the parent).
That said, looking up the constants via the parent would be more time-consuming than looking up constants that are stored directly in the element.
Difficult....
comment:21 follow-up: ↓ 22 Changed 7 years ago by
Hellooooo Simon !
Hmmmm... All this is important to settle before implementing those BoundedIntegerSequences? objects somewhere indeed.
What I thought you intended was to write a very low-level data structure somewhere in the misc/ folder. The kind of stuff that one does not use by mistake, the kind of stuff that ones does not use without reading the manual first. So I thought it would not exactly check its input, trusting blindly the developer and being as efficient as possible.
Then, I thought you would have a higher-level algebraic stuff with parents colors and blinking lights that would check their input indeed, and be what we want a user-friendly object to be.
But I actually have only one question: I don't know how you intend to use all this in the end, but I got the impression that several functions may have to generate a LOT of paths in order to compute some small data, and throw all the paths away once the computations have ended. Do you want do create high-level objects in those functions or not ?
If you don't, then perhaps you would be better with only low-level objects that you know how to use, which would not have to waste time (if time is actually wasted) on high-level problems.
And so I thought that you high-level objects would have a variable which would be this low-level object, etc, etc...
What do you have in mind ?
Nathann
comment:22 in reply to: ↑ 21 ; follow-up: ↓ 23 Changed 7 years ago by
Hi Nathann,
Replying to ncohen:
What I thought you intended was to write a very low-level data structure somewhere in the misc/ folder.
I somehow do.
The kind of stuff that one does not use by mistake,
Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).
Then, I thought you would have a higher-level algebraic stuff with parents colors and blinking lights that would check their input indeed, and be what we want a user-friendly object to be.
Again ideally: Blinking colours should only be added when one really implements paths in quivers.
But I actually have only one question: I don't know how you intend to use all this in the end, but I got the impression that several functions may have to generate a LOT of paths in order to compute some small data, and throw all the paths away once the computations have ended. Do you want do create high-level objects in those functions or not ?
Good point. So, you mean that these functions should actually not create fully-fledged quiver paths, but should just operate on instances of BoundedIntegerSequence
, and only in the end interpret them as paths, when the battle smoke has vanished.
And so I thought that you high-level objects would have a variable which would be this low-level object, etc, etc...
Yes, probably that's better.
So, perhaps I should really do cdef class BoundedIntegerSequence(tuple)
rather than cdef class BoundedIntegerSequence(MonoidElement)
.
But then, what to do with all these internally used constants that are needed even for the low-level functionality of BoundedIntegerSequence
? Would you prefer to compute and store these constants 10000 times when creating 10000 sequences of integers bounded by B
? Or would you prefer to compute and store these constants only once, namely in a parent that hosts all sequences of integers with a fixed bound?
Or do you have a third solution for storing/accessing these constants efficiently?
comment:23 in reply to: ↑ 22 ; follow-up: ↓ 24 Changed 7 years ago by
Helloooooo !!
Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).
Why do you want it to inherit from tuple ? I do not know the effect of such a thing, what it brings and what should be feared
Good point. So, you mean that these functions should actually not create fully-fledged quiver paths, but should just operate on instances of
BoundedIntegerSequence
, and only in the end interpret them as paths, when the battle smoke has vanished.
Well, yep. Only create the high-level stuff with possible overhead when you need it. And you end up knowing exactly what your code does when you are writing the code of your time-critical function.
So, perhaps I should really do
cdef class BoundedIntegerSequence(tuple)
rather thancdef class BoundedIntegerSequence(MonoidElement)
.
Hmmm.. I still don't get why you want it to inherit from tuple. My natural move would be to implement this as a C structure, not even a object :-P
But then, what to do with all these internally used constants that are needed even for the low-level functionality of
BoundedIntegerSequence
? Would you prefer to compute and store these constants 10000 times when creating 10000 sequences of integers bounded byB
? Or would you prefer to compute and store these constants only once, namely in a parent that hosts all sequences of integers with a fixed bound?
Well, somehow that's already what we do with graph backends. We have a Generic Backend, extended twice in Dense Graphs and Sparse Graphs. And all the constants needed by the data structures are stored in the corresponding files. You could have a BoundedIntegerSequence
class implementing no function at all, extended by alld ifferent implementations of the data structures you want. Even in the same file.
... And we don't need parents for that :-PPPP
Nathann
comment:24 in reply to: ↑ 23 ; follow-up: ↓ 25 Changed 7 years ago by
Replying to ncohen:
Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).
Why do you want it to inherit from tuple ?
By analogy to Sequence_generic
(which inherits from list
):
sage: S = Sequence([1,2,3]) sage: type(S) <class 'sage.structure.sequence.Sequence_generic'> sage: type(S).mro() [sage.structure.sequence.Sequence_generic, sage.structure.sage_object.SageObject, list, object]
Well, somehow that's already what we do with graph backends. We have a Generic Backend, extended twice in Dense Graphs and Sparse Graphs. And all the constants needed by the data structures are stored in the corresponding files.
No, that's a different situation. If you have a sequence S1
of integers bounded by B1
and a sequence S2
of integers bounded by B2
then the constants for S1
are different from the constants of S2
. So, it is not really a "global" constant that is shared by all instances of a certain data structure (and could thus be put into a file), but when you do, for example, for x in B1
then you'll need different constants than when you do for x in B2
.
So, these constants are not shared by all sequences, but by all sequence that share the same bound. Thus, it seems natural (to me, at least) to have one object O(B)
that is shared by all sequences of bound B
, so that O(B)
provides the constants that belong to B
.
Sure, O(B)
could be written in C as well. But why not making it a parent, when all what we do is letting each sequence have a pointer to O(B)
?
comment:25 in reply to: ↑ 24 ; follow-up: ↓ 26 Changed 7 years ago by
By analogy to
Sequence_generic
(which inherits fromlist
):
?...
Well. And why is that a good idea ? :-P
No, that's a different situation. If you have a sequence
S1
of integers bounded byB1
and a sequenceS2
of integers bounded byB2
then the constants forS1
are different from the constants ofS2
. So, it is not really a "global" constant that is shared by all instances of a certain data structure (and could thus be put into a file), but when you do, for example,for x in B1
then you'll need different constants than when you dofor x in B2
.
Hmmmm... I see. Well, isn't your only parameter the number of bits you need to store each entry ? If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?
If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something :-)
I'd say.... 4,8,16,32 ? :-)
So, these constants are not shared by all sequences, but by all sequence that share the same bound. Thus, it seems natural (to me, at least) to have one object
O(B)
that is shared by all sequences of boundB
, so thatO(B)
provides the constants that belong toB
.
If your data structure is not an object, you could also add this width parameter as an input of the functions you use ? I guess all Bounded Sequence Integer that you use in the same function would have the same bound.
But admittedly I love to split hairs. Tell me if you think that this is going too far :-P
Sure,
O(B)
could be written in C as well. But why not making it a parent, when all what we do is letting each sequence have a pointer toO(B)
?
Ahahahah. That's up to you. I just hate Sage's parents/elements. I would live happy with a C pointer I can control, but not witha framework that I do not understand. If you understand it, then you know what you are doing. I don't :-P
Nathann
comment:26 in reply to: ↑ 25 ; follow-ups: ↓ 27 ↓ 30 Changed 7 years ago by
Replying to ncohen:
Hmmmm... I see. Well, isn't your only parameter the number of bits you need to store each entry ?
No. We have
cdef Integer NumberArrows cdef Integer IntegerMask cdef size_t ArrowBitSize, BigArrowBitSize cdef block* bitmask = NULL cdef block DataMask, BitMask cdef size_t ArrowsPerBlock, BigArrowsPerBlock cdef size_t ModBAPB, DivBAPB, TimesBABS, TimesBlockSize
For example, ModBAPB
is used to replace n%BigArrowsPerBlock
by a bit-wise and
, which is noticeably faster.
If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?
Depending on the implementation...
If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something
:-)
Isn't duplication of code supposed to smell?
If your data structure is not an object, you could also add this width parameter as an input of the functions you use ? I guess all Bounded Sequence Integer that you use in the same function would have the same bound.
Do I understand correctly: You suggest that we should not provide a class BoundedIntegerSequence
that can be used interactively in Sage, but you suggest that we should just provide a couple of functions that operate on C structures (say, mpz_t
or usigned long*
) and can only be used in Cython code?
Then it would be better to revert this ticket to its original purpose: There would be...
- ... cdef functions operating on
mpz_t
resp. onunsigned long*
, - ... a Cython class
Path
usingmpz_t
resp.unsigned long*
as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration, - ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with
Path
but withmpz_t*
respunsigned long**
(which will be the case anyway).
But admittedly I love to split hairs. Tell me if you think that this is going too far
:-P
No problem. I just want to know if people see the need to have a Cython implementation of sequences of bounded integers, for use in Python.
Ahahahah. That's up to you. I just hate Sage's parents/elements. I would live happy with a C pointer I can control, but not witha framework that I do not understand.
Making it something like a container rather than a fully fledged parent would be an option, from my perspective.
comment:27 in reply to: ↑ 26 Changed 7 years ago by
Replying to SimonKing:
Making it something like a container rather than a fully fledged parent would be an option, from my perspective.
... I mean, for BoundedIntegerSequence
. Of course, Path
must have a parent (the path semigroup of a quiver).
comment:28 Changed 7 years ago by
Just a random thought. Some issues here are of the same nature as what was discussed around ClonableElement? and its subclasses. Maybe there is stuff to share with it, especially if one ends up inheriting from Element.
comment:29 Changed 7 years ago by
I just had a look at ClonableElement
, and one immediate thought is: Why are C-pointers allocated in __init__
? That's clearly the purpose of __cinit__
, in particular if said C-pointers are freed in __dealloc__
.
Do we want to have sequences of bounded integers for use in Python? Then one possible way would be:
- have some inline cdef functions that operate on certain (templated?) C data structures
- use these in a sub-class of
ClonableElement
(so that we can have sequences of bounded integers used in python) - use these in a
Path
class, to be used in path semigroups.
comment:30 in reply to: ↑ 26 ; follow-up: ↓ 31 Changed 7 years ago by
Yooooooo !!!
No. We have
cdef Integer NumberArrows cdef Integer IntegerMask cdef size_t ArrowBitSize, BigArrowBitSize cdef block* bitmask = NULL cdef block DataMask, BitMask cdef size_t ArrowsPerBlock, BigArrowsPerBlock cdef size_t ModBAPB, DivBAPB, TimesBABS, TimesBlockSize
Hmmmm.. Can't they all be deduced from the type of implementation (compressed long, for instance) and the number of bits per entry then ?
For example,
ModBAPB
is used to replacen%BigArrowsPerBlock
by a bit-wiseand
, which is noticeably faster.
Yepyep of course.
If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?
Depending on the implementation...
Okay....
If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something
:-)
Isn't duplication of code supposed to smell?
Well, if it is a template the code is only implemented once. And it will generate several data structures at compile time indeed, but each of them will have a hardcoded constant, and you only implement one.
Do I understand correctly: You suggest that we should not provide a class
BoundedIntegerSequence
that can be used interactively in Sage, but you suggest that we should just provide a couple of functions that operate on C structures (say,mpz_t
orusigned long*
) and can only be used in Cython code?
That's what I had in mind, but you make decisions here, pick what you like.
It's a bit how bitsets are implemented. A C data structure, and an object wrapper on top of it. Or C Graphs, with a data structure on top of it. It is how Permutations should be implemented, too :-P
Well, you have a C object that you use when you want performance, and when you don't care you use the higher-level implementations.
Then it would be better to revert this ticket to its original purpose: There would be...
- ... cdef functions operating on
mpz_t
resp. onunsigned long*
,- ... a Cython class
Path
usingmpz_t
resp.unsigned long*
as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,- ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with
Path
but withmpz_t*
respunsigned long**
(which will be the case anyway).
Well, I like this plan. What do you think ? Does it displease you in any way ?
No problem. I just want to know if people see the need to have a Cython implementation of sequences of bounded integers, for use in Python.
I do not see how I could use it in my code at the moment, but I like the idea of having such a data structure somewhere, ready to be used. Most importantly, if you want speed, I think that having it available at two different levels is a safe bet. You write more code, but you will know how computations are spent in your time-critical functions.
Nathann
comment:31 in reply to: ↑ 30 ; follow-up: ↓ 33 Changed 7 years ago by
Hi Nathann,
Replying to ncohen:
Hmmmm.. Can't they all be deduced from the type of implementation (compressed long, for instance) and the number of bits per entry then ?
Sure. And you want to do this same computation 10^7
times?
Isn't duplication of code supposed to smell?
Well, if it is a template the code is only implemented once. And it will generate several data structures at compile time indeed, but each of them will have a hardcoded constant, and you only implement one.
Yes. Probably my problem is that I don't think "C++", but I think "Cython". And there it is a bit clumsy (I think templating is mimicked in Cython by including .pxi files, isn't it?).
That's what I had in mind, but you make decisions here, pick what you like.
It's a bit how bitsets are implemented.
Right, that's what I have looked at, and think makes sense here. Up to the constants.
Then it would be better to revert this ticket to its original purpose: There would be...
- ... cdef functions operating on
mpz_t
resp. onunsigned long*
,- ... a Cython class
Path
usingmpz_t
resp.unsigned long*
as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,- ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with
Path
but withmpz_t*
respunsigned long**
(which will be the case anyway).Well, I like this plan. What do you think ? Does it displease you in any way ?
No, it is one possibility.
comment:32 Changed 7 years ago by
- Cc hivert added
comment:33 in reply to: ↑ 31 ; follow-up: ↓ 34 Changed 7 years ago by
Sure. And you want to do this same computation
10^7
times?
I don't get it. If the width is a power of two, then there are 6 possible values, i.e. 21, 22, 23, 24, 25, 26. Otherwise you have at most 64, but really who cares in the end ?
Yes. Probably my problem is that I don't think "C++", but I think "Cython". And there it is a bit clumsy (I think templating is mimicked in Cython by including .pxi files, isn't it?).
I am thinking of this : http://stackoverflow.com/questions/4840819/c-template-specialization-with-constant-value
That's C++, and I don't see how to do it in Cython, though. The point is that you can parameter a data structure with a constant value, and the data structure will be compiled for each value of the constant *that appears in the code at compile-time* (not all possible values, or course). The point is that the code is optimized while knowing the value(which can lead to optimization) and also that you can do things that would not be possible otherwise, like "cdef int a[k]" where k is a (templated) variable.
Nathann
comment:34 in reply to: ↑ 33 ; follow-up: ↓ 35 Changed 7 years ago by
Replying to ncohen:
Sure. And you want to do this same computation
10^7
times?I don't get it. If the width is a power of two, then there are 6 possible values, i.e. 21, 22, 23, 24, 25, 26. Otherwise you have at most 64, but really who cares in the end ?
Apparently I am not used to templated code...
I was somehow thinking that you suggested to keep the bitlength of the upper bound for the integers as the only parameter, which means that you would need to compute all the other things either for each operation or (when storing it as attribute of the sequences) for each sequence.
I am thinking of this : http://stackoverflow.com/questions/4840819/c-template-specialization-with-constant-value
That's C++, and I don't see how to do it in Cython, though.
Part of the question is: Do we want to use gmp as backend? Or do we want to (re-)implement operations on densely packed long*
by ourselves? Given the timings, gmp is an option.
Anyway, I'll have a look at the stackoverflow discussion. Thank you for the pointer!
The next question then is how to access the code from Cython. I've done this with C, but not C++.
comment:35 in reply to: ↑ 34 ; follow-up: ↓ 37 Changed 7 years ago by
Apparently I am not used to templated code...
I was somehow thinking that you suggested to keep the bitlength of the upper bound for the integers as the only parameter, which means that you would need to compute all the other things either for each operation or (when storing it as attribute of the sequences) for each sequence.
Nonono. Somehow, you can think of templates as a way to "script" the generation of code. You write some code with a constant k which is never defined anywhere, and then you say "compile one version of this code for each k in 1, 2, 3, 4, 5, 6. You end up with several data structures implemented, each with a hardcoded value. And then you can create an object cdef LongCompressedPath<5> my_path
in which the hardcoded bound is 2^5
. You just have many additional types, each with its own constant.
Part of the question is: Do we want to use gmp as backend? Or do we want to (re-)implement operations on densely packed
long*
by ourselves? Given the timings, gmp is an option.
Yep, you are right. Depends on how you weight the operations that you will need: it looks okay for concatenations and bad for startwith, but I expect that the most time consuming operation will be concatenation ? You are the one who needs it :-)
Anyway, I'll have a look at the stackoverflow discussion. Thank you for the pointer!
The next question then is how to access the code from Cython. I've done this with C, but not C++.
HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?
Nathann
comment:36 Changed 7 years ago by
- Milestone changed from sage-6.2 to sage-6.3
comment:37 in reply to: ↑ 35 ; follow-up: ↓ 42 Changed 7 years ago by
Hi Nathann,
sorry for the long silence!
Replying to ncohen:
HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?
Did you find out how templated code can be written in Cython? If you have asked on cython-devel (so far I have only posted on cython-users) or have another pointer, then please tell me. If not, tell me also...
Cheers, Simon
comment:38 Changed 7 years ago by
Hi Nathann,
wouldn't the following be close to templates?
In Cython:
cimport cython ctypedef fused str_or_int: str int cpdef str_or_int times_two(str_or_int var): return var+var def show_me(): cdef str a = 'a' cdef int b = 3 print 'str', times_two(a) print 'int', times_two(b)
and this results in
sage: show_me() str aa int 6
comment:39 Changed 7 years ago by
Or perhaps a better example, showing that the fused type is used to choose code at compile time: Put the following into a cell of the notebook
%cython cimport cython ctypedef fused str_or_int: str int cpdef str_or_int times_two(str_or_int var): if str_or_int is int: return var*2 else: return var+var def test(): cdef int a = 3 cdef str b = 'b' a = times_two(a) b = times_two(b)
and look at the resulting code. The line a = times_two(a)
becomes
__pyx_v_a = __pyx_fuse_1__pyx_f_68_home_king__sage_sage_notebook_sagenb_home_admin_2_code_sage4_spyx_0_times_two(__pyx_v_a, 0);
whereas b = times_two(b)
becomes
__pyx_t_1 = __pyx_fuse_0__pyx_f_68_home_king__sage_sage_notebook_sagenb_home_admin_2_code_sage4_spyx_0_times_two(__pyx_v_b, 0);
and one can check that Cython created two different C-functions __pyx_fuse_0...
and __pyx_fuse_1...
for the two possible (type-dependent!) versions of times_two
.
comment:40 follow-up: ↓ 41 Changed 7 years ago by
And here an example showing that it pays off, speed-wise:
In a .pyx file:
cimport cython ctypedef fused str_or_int: str int cpdef str_or_int times_two(str_or_int var): if str_or_int is int: return var+var else: return var+var cpdef untyped_times_two(var): return var+var cpdef int int_times_two(int var): return var+var cpdef str str_times_two(str var): return var+var def test1(): cdef int a = 3 cdef str b = 'b' a = times_two(a) b = times_two(b) def test2(): cdef int a = 3 cdef str b = 'b' a = untyped_times_two(a) b = untyped_times_two(b) def test3(): cdef int a = 3 cdef str b = 'b' a = int_times_two(a) b = str_times_two(b)
Attach the .pyx file to Sage, and get this:
age: %attach /home/king/Sage/work/templates/test.pyx Compiling /home/king/Sage/work/templates/test.pyx... sage: %timeit test1() 10000000 loops, best of 3: 159 ns per loop sage: %timeit test2() 1000000 loops, best of 3: 199 ns per loop sage: %timeit test3() 10000000 loops, best of 3: 168 ns per loop
EDIT: I have replaced the original example by something fairer, where it says var+var
everywhere, but in the templated version Cython can choose the c-implementation of var+var
at compile time, whereas it can't (and hence loses time when running the code) in the untemplated version.
EDIT 2: The new example also shows that choosing a specialised implementation manually is not faster than Cython using templates.
comment:41 in reply to: ↑ 40 ; follow-up: ↓ 43 Changed 7 years ago by
Hellooooooooooooooooo !!!
EDIT: I have replaced the original example by something fairer, where it says
var+var
everywhere, but in the templated version Cython can choose the c-implementation ofvar+var
at compile time, whereas it can't (and hence loses time when running the code) in the untemplated version.
HMmmmmm... Does it make any difference ? gcc should resolve those "if" at compile-time, so if Cython creates two version of the function with an "if" inside depending on the type, gcc should not include it in the final binary as the constants involved can be defined at compile-time.
Anyway, that's a good news ! :-D
Nathann
comment:42 in reply to: ↑ 37 Changed 7 years ago by
Yooooooooooo !!
HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?
Did you find out how templated code can be written in Cython? If you have asked on cython-devel (so far I have only posted on cython-users) or have another pointer, then please tell me. If not, tell me also...
Ahahah.. Well, no ... When I ask a question I do not do anything before I get an answer, soooooooooo I never did it :-P
I ask a question, then forget everything about it. When I get the answer I do what should be done :-P
Nathann
comment:43 in reply to: ↑ 41 ; follow-up: ↓ 44 Changed 7 years ago by
Hi Nathann,
Replying to ncohen:
HMmmmmm... Does it make any difference ? gcc should resolve those "if" at compile-time, so if Cython creates two version of the function with an "if" inside depending on the type, gcc should not include it in the final binary as the constants involved can be defined at compile-time.
Yes, it is indeed resolved at compile time. Two c-functions are created, and only the part of the code relevant for the respective code branch appears in the c-function.
In other words, I now plan to create cdef functions like this:
ctypedef seq_t: first_implementation second_implementation cdef seq_t concat(seq_t x, seq_t y): if seq_t is first_implementation: <do stuff for 1st implementation> else: <do stuff for 2nd implementation>
and then, in other modules (perhaps also in src/sage/combinat/words/word_datatypes.pyx
?), one could uniformly call concat(a,b)
, provided that the type of a
and b
are known at compile time (or write concat[first_implementation](a,b)
explicitly).
I think that such Cython code will be sufficiently nice to read, and the resulting C-code will be sufficiently fast.
Anyway, that's a good news !
:-D
Indeed!
Simon
comment:44 in reply to: ↑ 43 Changed 7 years ago by
I think that such Cython code will be sufficiently nice to read, and the resulting C-code will be sufficiently fast.
+1
Nathann
comment:45 Changed 7 years ago by
Hmmm. Meanwhile I am a bit less positive about fused types. Let seq_t
be a fused type for different implementations.
- Fused types can not be used in iterators. I.e., the following is refused:
def iter_seq(seq_t S): for <bla>: <do something> yield x
- Fused types can not be used as attributes of cdef classes. I.e., the following is refused:
cdef class BoundedIntSeq: cdef seq_t seq def __mul__(self, other): <do type check> cdef seq_t out = concat_seq(self.seq, other.seq) <create and return new instance of BoundedIntSeq using "out">
Hence, it seems that one needs to create two separate iterators, and it also seems that one needs code duplication. I.e., when impl1
and impl2
are two specialisations of the fused type seq_t
, then one has to have
cdef class BoundedIntSeq_impl1: cdef impl1 seq def __mul__(self, other): ... cdef impl1 out = concat_seq(self.seq, other.seq) cdef class BoundedIntSeq_impl2: cdef impl2 seq def __mul__(self, other): ... cdef impl2 out = concat_seq(self.seq, other.seq)
This mainly amounts to cut-and-paste, but is ugly!
I think I'll ask on cython-users if there are known ideas to solve this more elegantly.
comment:46 Changed 7 years ago by
Some variation on the theme: One can create fused types from two cdef classes, and then one can work with specialisations of functions as follows:
ctypedef fused str_or_int: str int # make the function do different things on different types, # to better see the difference cpdef str_or_int times_two(str_or_int var): if str_or_int is int: return var+var else: return var+'+'+var # forward declaration of cdef classes and their fusion cdef class Bar_int cdef class Bar_str cdef fused Bar: Bar_int Bar_str # a function that can be specialised to either of the two extension classes def twice(Bar self): # and inside, we specialised a function depending # on the type of an attribute --- at compile time!! return type(self)(times_two(self.data)) cdef class Bar_str: cdef str data def __init__(self, data): self.data = data def __repr__(self): return self.data # First possibility: Use the "templated" function as a method twice_ = twice[Bar_str] # second possibility: Wrap the "templated" function cpdef Bar_str twice__(self): return twice(self) # The same, with the other Specialisation of <Bar> cdef class Bar_int: cdef int data def __init__(self, data): self.data = data def __repr__(self): return "int(%d)"%self.data twice_ = twice[Bar_int] cpdef Bar_int twice__(self): return twice(self)
Time-wise, we see that wrapping the "templated" function is slower than using it as an attribute:
sage: b1 = Bar_str('abc') sage: b1.twice_() abc+abc sage: b1.twice__() abc+abc sage: %timeit b1.twice_() 1000000 loops, best of 3: 759 ns per loop sage: %timeit b1.twice__() 100000 loops, best of 3: 13.7 µs per loop sage: b2 = Bar_int(5) sage: b2.twice_() int(10) sage: b2.twice__() int(10) sage: %timeit b2.twice_() 1000000 loops, best of 3: 539 ns per loop sage: %timeit b2.twice__() 100000 loops, best of 3: 9.2 µs per loop
Summary and further comments
- We can reduce the code duplication to something like this:
<define templated function foo_ depending on a fused type Bar with specialisations Bar1, Bar2,...> cdef class Bar1: foo = foo_[Bar1] cdef class Bar2: foo = foo_[Bar2] ...
I hope this level of copy-and-paste can be afforded. At least, it gives good speed. - In my tests I found the following:
- It is impossible to make
foo_
a cpdef function. Otherwise, the assignmentfoo = foo_
fails with the statement thatfoo_
can not be turned into a Python type. - It is impossible to chose the same name for the templated function and for the method it is turned into. Doing
class Bar1: foo_ = foo_
results in an error saying thatBar1
has not attributefoo_
. - With the syntax above, both
twice_
andtwice__
have been put into the class'__dict__
. I did not succeed to turn it into the equivalent of a cdef method.
- It is impossible to make
comment:47 Changed 7 years ago by
I did a lot of benchmark tests for the different implementations (using long*
or char*
or mpz_t
to store sequences of bounded integers, using cdef functions that operate on fused types) and eventually found that using GMP (i.e., mpz_t
) is fastest in all tests, by a wide margin.
Why did I not find this result before? Well, part of the reason is that GMP sometimes provides different functions to do the same job, and in the past few days I considerably improved my usage of the various possibilities. For example, mpz_t
can be initialised in different ways. If one uses mpz_init
, then the assignment that will follow results in a re-allocation. Since the length of the concatenation of two tuples is known in advance, prescribing the number of to-be-allocated bits with mpz_init2
resulted in a speed-up of concatenation, actually it became twice as fast.
And this result is not surprising. After all, GMP has to implement the same bitshift and comparison operations for bit arrays that I was implementing in the "long*
" approach, too. But GMP probably uses implementation techniques that I can not even dream of.
Now, my plan is to
- create some
ctypedef struct bounded_int_tuple
that is based onmpz_t
and is the underlying C structure for tuples of bounded integers - provide a collection of cdef (inline) functions, operating on
bounded_int_tuple
, for concatenation, copying, comparison and so on - wrap it in a cdef class
BoundedIntegerTuple
. This will probably not derive fromElement
: I think it makes no difference from the point of view of memory efficiency whether eachBoundedIntegerTuple
stores a pointer to the parent of "all tuples of integers bounded byB
" or directly storesB
as an unsigned int.
In a subsequent ticket, I plan to use it for quiver paths, and if the combinat crowd cares, they can use it to make parts of sage.combinat.words a lot faster.
Comments? Requests? Suggestion of alternatives?
comment:48 follow-up: ↓ 51 Changed 7 years ago by
None from me. I keep being impressed at how much testing/researching you put into this quiver features... I really wonder what you will do with it in the end ;-)
Nathann
comment:49 Changed 7 years ago by
- Branch set to u/SimonKing/ticket/15820
- Created changed from 02/14/14 17:31:31 to 02/14/14 17:31:31
- Modified changed from 05/25/14 15:39:08 to 05/25/14 15:39:08
comment:50 Changed 7 years ago by
- Commit set to 5dc78c5f5f647bf95b98da916d7086fae539ffb8
- Status changed from new to needs_review
I have pushed a branch with an initial implementation of bounded integer sequences. I took care of error handling: I think you will find that those GMP commands that may result in a memory error are wrapped in sig_on()/sig_off(), and the cdef boilerplate functions do propagate errors. In fact, in some of my tests, I was hitting Ctrl-C, and it worked.
The rest of this post is about benchmarks. I compare against Python tuples---which is of course ambitious, and you will find that not in all situations tuples are slower than bounded integer sequences. However, one should see it positively: The bounded integer sequences implemented here have features very similar to tuples, and there are operations for which they are a lot faster than tuples. It all depends on the length and the bound of the sequence. That's why I think that it is a reasonable contribution. And to make it faster (without Python classes), one can still use the cdef boilerplate functions. Generally, it seems that long bounded integer sequences are faster than long tuples, but short tuples are faster than short sequences. For hash, bounded integer sequences are pretty good, but they suck in accessing items, or in slicing with step different from 1.
In contrast to tuples, bounded integer sequences also provide fairly quick tests for whether a sequences starts with a given sequence, or whether a sequence contains a certain sub-sequence. This is a feature I took from strings.
Here are the tests, with different sizes and bounds. Variable names starting with "T" hold tuples, those starting with "S" hold bounded integer sequences.
sage: from sage.structure.bounded_integer_sequences import BoundedIntegerSequence sage: L0 = [randint(0,7) for i in range(5)] sage: L1 = [randint(0,15) for i in range(15)] sage: L2 = [randint(0,31) for i in range(50)] sage: L3 = [randint(0,31) for i in range(5000)] sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3) sage: S0 = BoundedIntegerSequence(8, L0) sage: S1 = BoundedIntegerSequence(16, L1) sage: S2 = BoundedIntegerSequence(32, L2) sage: S3 = BoundedIntegerSequence(32, L3)
Iteration
sage: timeit("x=[y for y in T0]", number=1000000) 1000000 loops, best of 3: 783 ns per loop sage: timeit("x=[y for y in T1]", number=1000000) 1000000 loops, best of 3: 1.48 µs per loop sage: timeit("x=[y for y in T2]", number=100000) 100000 loops, best of 3: 4.3 µs per loop sage: timeit("x=[y for y in T3]", number=1000) 1000 loops, best of 3: 245 µs per loop sage: timeit("x=[y for y in S0]", number=1000000) 1000000 loops, best of 3: 2.38 µs per loop sage: timeit("x=[y for y in S1]", number=100000) 100000 loops, best of 3: 4.1 µs per loop sage: timeit("x=[y for y in S2]", number=100000) 100000 loops, best of 3: 10.1 µs per loop sage: timeit("x=[y for y in S3]", number=1000) 1000 loops, best of 3: 1.79 ms per loop
Slicing
Bounded integer sequences are immutable and hence copied by identity. But let us do slicing, dropping the last item:
sage: timeit("x=T3[:-1]", number=100000) 100000 loops, best of 3: 19.5 µs per loop sage: timeit("x=S3[:-1]", number=100000) 100000 loops, best of 3: 3.48 µs per loop
Slicing with step!=1
is much slower, though:
sage: timeit("x=T3[:-1:2]", number=100000) 100000 loops, best of 3: 11.7 µs per loop sage: timeit("x=S3[:-1:2]", number=1000) 1000 loops, best of 3: 2.23 ms per loop
Perhaps I should mark it "TODO"...
Accessing single items
Short sequences
sage: timeit("x=T0[1]", number=1000000) 1000000 loops, best of 3: 361 ns per loop sage: timeit("x=T0[4]", number=1000000) 1000000 loops, best of 3: 373 ns per loop sage: timeit("x=S0[1]", number=1000000) 1000000 loops, best of 3: 959 ns per loop sage: timeit("x=S0[4]", number=1000000) 1000000 loops, best of 3: 960 ns per loop
Large sequences (time also depends on the bound for the integer sequence!)
sage: timeit("x=T3[1]", number=1000000) 1000000 loops, best of 3: 359 ns per loop sage: timeit("x=T3[4500]", number=1000000) 1000000 loops, best of 3: 382 ns per loop sage: timeit("x=S3[1]", number=1000000) 1000000 loops, best of 3: 1.97 µs per loop sage: timeit("x=S3[4500]", number=1000000) 1000000 loops, best of 3: 1.49 µs per loop
Comparison
Note that comparison of bounded integer sequences works different from comparison of tuples or lists, as detailed in the documentation.
We compare sequences that are equal but non-identical, that differ in early items, or that differ in late items.
sage: L0x = [randint(0,7) for i in range(5)] sage: L1x = [randint(0,15) for i in range(15)] sage: L2x = [randint(0,31) for i in range(50)] sage: L3x = [randint(0,31) for i in range(5000)] # verified that they differ from L0,L1,L2,L3 sage: T0x = tuple(L0x); T1x = tuple(L1x); T2x = tuple(L2x); T3x = tuple(L3x) sage: S0x = BoundedIntegerSequence(8, L0x) sage: S1x = BoundedIntegerSequence(16, L1x) sage: S2x = BoundedIntegerSequence(32, L2x) sage: S3x = BoundedIntegerSequence(32, L3x) sage: T0y = tuple(L0); T1y = tuple(L1); T2y = tuple(L2); T3y = tuple(L3) sage: S1y = BoundedIntegerSequence(8, L1) sage: S2y = BoundedIntegerSequence(32, L2) sage: S3y = BoundedIntegerSequence(32, L3) sage: S1y==S1!=S1x, S1y is not S1 (True, True)
Early differences:
sage: timeit("T0==T0x", number=1000000) 1000000 loops, best of 3: 143 ns per loop sage: timeit("T1==T1x", number=1000000) 1000000 loops, best of 3: 145 ns per loop sage: timeit("T2==T2x", number=1000000) 1000000 loops, best of 3: 143 ns per loop sage: timeit("T3==T3x", number=1000000) 1000000 loops, best of 3: 161 ns per loop sage: timeit("S0==S0x", number=1000000) 1000000 loops, best of 3: 538 ns per loop sage: timeit("S1==S1x", number=1000000) 1000000 loops, best of 3: 550 ns per loop sage: timeit("S2==S2x", number=1000000) 1000000 loops, best of 3: 490 ns per loop sage: timeit("S3==S3x", number=1000000) 1000000 loops, best of 3: 559 ns per loop
Equal sequences:
sage: timeit("T0==T0y", number=1000000) 1000000 loops, best of 3: 169 ns per loop sage: timeit("T1==T1y", number=1000000) 1000000 loops, best of 3: 255 ns per loop sage: timeit("T2==T2y", number=1000000) 1000000 loops, best of 3: 597 ns per loop sage: timeit("T3==T3y", number=100000) 100000 loops, best of 3: 47.8 µs per loop sage: timeit("S0==S0y", number=1000000) 1000000 loops, best of 3: 511 ns per loop sage: timeit("S1==S1y", number=1000000) 1000000 loops, best of 3: 493 ns per loop sage: timeit("S2==S2y", number=1000000) 1000000 loops, best of 3: 583 ns per loop sage: timeit("S3==S3y", number=1000000) 1000000 loops, best of 3: 1.41 µs per loop
Late differences:
sage: T0z1 = T0+T0 sage: T0z2 = T0+T0x sage: T1z1 = T1+T1 sage: T1z2 = T1+T1x sage: T2z1 = T2+T2 sage: T2z2 = T2+T2x sage: T3z1 = T3+T3 sage: T3z2 = T3+T3x sage: timeit("T0z1==T0z2", number=100000) 100000 loops, best of 3: 206 ns per loop sage: timeit("T1z1==T1z2", number=100000) 100000 loops, best of 3: 308 ns per loop sage: timeit("T2z1==T2z2", number=100000) 100000 loops, best of 3: 640 ns per loop sage: timeit("T3z1==T3z2", number=100000) 100000 loops, best of 3: 47.8 µs per loop sage: S0z1 = S0+S0 sage: S0z2 = S0+S0x sage: S1z1 = S1+S1 sage: S1z2 = S1+S1x sage: S2z1 = S2+S2 sage: S2z2 = S2+S2x sage: S3z1 = S3+S3 sage: S3z2 = S3+S3x sage: timeit("S0z1==S0z2", number=100000) 100000 loops, best of 3: 585 ns per loop sage: timeit("S1z1==S1z2", number=100000) 100000 loops, best of 3: 494 ns per loop sage: timeit("S2z1==S2z2", number=100000) 100000 loops, best of 3: 555 ns per loop sage: timeit("S3z1==S3z2", number=100000) 100000 loops, best of 3: 578 ns per loop
Hash
This is the only operation for which bounded integer sequences seem to be consistently faster than tuples:
sage: timeit("hash(T0)", number=100000) 100000 loops, best of 3: 179 ns per loop sage: timeit("hash(T0)", number=1000000) 1000000 loops, best of 3: 177 ns per loop sage: timeit("hash(T1)", number=1000000) 1000000 loops, best of 3: 267 ns per loop sage: timeit("hash(T2)", number=1000000) 1000000 loops, best of 3: 584 ns per loop sage: timeit("hash(T3)", number=100000) 100000 loops, best of 3: 45.3 µs per loop sage: timeit("hash(S0)", number=1000000) 1000000 loops, best of 3: 145 ns per loop sage: timeit("hash(S1)", number=1000000) 1000000 loops, best of 3: 153 ns per loop sage: timeit("hash(S2)", number=1000000) 1000000 loops, best of 3: 194 ns per loop sage: timeit("hash(S3)", number=1000000) 1000000 loops, best of 3: 8.97 µs per loop
Concatenation
sage: timeit("T0+T0", number=1000000) 1000000 loops, best of 3: 200 ns per loop sage: timeit("T1+T1", number=1000000) 1000000 loops, best of 3: 311 ns per loop sage: timeit("T2+T2", number=1000000) 1000000 loops, best of 3: 742 ns per loop sage: timeit("T3+T3", number=10000) 10000 loops, best of 3: 40.3 µs per loop sage: timeit("S0+S0", number=1000000) 1000000 loops, best of 3: 576 ns per loop sage: timeit("S1+S1", number=1000000) 1000000 loops, best of 3: 590 ns per loop sage: timeit("S2+S2", number=1000000) 1000000 loops, best of 3: 618 ns per loop sage: timeit("S3+S3", number=1000000) 1000000 loops, best of 3: 2.17 µs per loop
Subsequences
Recall the definition of S0z1
etc. We find:
sage: S0z2.startswith(S0) True sage: S0z2.startswith(S0x) False sage: S0x in S0z2 True sage: timeit("S0z2.startswith(S0)", number=1000000) 1000000 loops, best of 3: 239 ns per loop sage: timeit("S0z2.startswith(S0x)", number=1000000) 1000000 loops, best of 3: 241 ns per loop sage: timeit("S0x in S0z2", number=1000000) 1000000 loops, best of 3: 694 ns per loop sage: timeit("S1z2.startswith(S1)", number=1000000) 1000000 loops, best of 3: 227 ns per loop sage: timeit("S1z2.startswith(S1x)", number=1000000) 1000000 loops, best of 3: 223 ns per loop sage: timeit("S1x in S1z2", number=1000000) 1000000 loops, best of 3: 1.08 µs per loop sage: timeit("S2z2.startswith(S2)", number=1000000) 1000000 loops, best of 3: 247 ns per loop sage: timeit("S2z2.startswith(S2x)", number=1000000) 1000000 loops, best of 3: 230 ns per loop sage: timeit("S2x in S2z2", number=1000000) 1000000 loops, best of 3: 3.21 µs per loop sage: timeit("S3z2.startswith(S3)", number=1000000) 1000000 loops, best of 3: 989 ns per loop sage: timeit("S3z2.startswith(S3x)", number=1000000) 1000000 loops, best of 3: 218 ns per loop sage: timeit("S3x in S3z2", number=1000) 1000 loops, best of 3: 3.57 ms per loop
I wonder if the latter could be improved. Another "TODO"...
New commits:
5dc78c5 | Implement sequences of bounded integers
|
comment:51 in reply to: ↑ 48 Changed 7 years ago by
Replying to ncohen:
None from me. I keep being impressed at how much testing/researching you put into this quiver features... I really wonder what you will do with it in the end
;-)
- Gröbner bases for modules over path algebra quotients.
- In particular, an implementation of the non-commutative version of Faugère's F5 algorithm that I have described in my latest article.
- Compute minimal generating sets for modules over so-called "basic algebras" (that's a special type of path algebra quotients), which is possible with the non-commutative F5, as shown in my latest article.
- Use all this to compute minimal projective resolutions of basic algebras, but please be faster than what we currently do in our optional group cohomology spkg. Currently, we use a method of David Green for computing the minimal resolutions; but F5 should be more efficient---theoretically...
That said: I guess it would make sense to adopt bounded integer sequences in appropriate parts of sage.combinat.words, but I leave this to others.
comment:52 follow-up: ↓ 53 Changed 7 years ago by
Hellooooooo Simon !
Sorry but I only began to read your code and now I should really go get some sleep. I paste my current notes here in the meantime, but I am a long way from a proper review job.
- bounded integer sequence, biseq. What about "integer sequence" ? All integers are "bounded" on a computer. Why add this "bounded" everywhere ? biseq -> iseq ?
# Bitsize of "data"
--> bitsize of the whole sequence
- itembitsize --> necessarily a power of 2 ?
#cdef inline void dealloc_biseq(biseq_t S)
: why there if commented ?
list2biseq
(and others) ->list_to_biseq
?
- Comments on the function definitions --> Move it to the function's doc, even if it is their only content ?
- Could you write some doc, even if it is just one line, for cdef functions ? A bit like what we have for bitsets ? Something like
cdef biseq_t* allocate_biseq(size_t l, unsigned long int itemsize): """ Allocates a sequences of l integers stored on itemsize bits each. """
- I just figured out that you may have wanted to keep this list of functions above with short descriptions as an 'index' of what the file does. If so, it would be cool to do it in the html doc instead ! It is nice to have documentation for C functions even if you have to do it by hand.. An example there : http://www.sagemath.org/doc/reference/graphs/sage/graphs/base/static_sparse_graph.html
- list2biseq --> I wondered why this function wouldn't return a new biseq buil from a list instead of modifying an existing one.... Performance issue ? Is that critical in some cases ?
Nathann
comment:53 in reply to: ↑ 52 Changed 7 years ago by
Replying to ncohen:
- bounded integer sequence, biseq. What about "integer sequence" ?
Not good, because...
All integers are "bounded" on a computer. Why add this "bounded" everywhere ? biseq -> iseq
... the computer's integer bound is chosen when you buy the machine, but not when you create the sequence.
The point is that each "bounded integer sequence" has a bound B
that can be chosen upon initialisation so that any item of the sequence is smaller than B
.
# Bitsize of "data"
--> bitsize of the whole sequence
OK.
- itembitsize --> necessarily a power of 2 ?
No. Here, it is really "bit" size. So, the number tells how many bits will be reserved to store one item. Hence, the bound B
above is a power of two, namely 2^itembitsize
.
#cdef inline void dealloc_biseq(biseq_t S)
: why there if commented ?
Because I forgot to remove it.
list2biseq
(and others) ->list_to_biseq
?
Acceptable.
- Comments on the function definitions --> Move it to the function's doc, even if it is their only content ?
- Could you write some doc, even if it is just one line, for cdef functions ?
I wouldn't remove the comments from the pxd file, but certainly some documentation should be added to the cdef functions in the pyx file. I wonder if this should be by comments in the code or by a doc string. After all, the doc string will not be visible (cdef functions won't appear in the docs, as they can not imported).
- I just figured out that you may have wanted to keep this list of functions above with short descriptions as an 'index' of what the file does. If so, it would be cool to do it in the html doc instead !
You mean: "This file provides the following boilerplate functions, that you can cimport for usage in Cython code: ..."?
- list2biseq --> I wondered why this function wouldn't return a new biseq buil from a list instead of modifying an existing one.... Performance issue ? Is that critical in some cases ?
This is motivated by my usage in the BoundedIntegerSequence
wrapper. As usual, there is __cinit__
which does memory allocations (as much as I know, it is recommended to not do this kind of things in __init__
). Hence, when __init__
is called (and I think filling data into the allocated memory is the job of __init__
, not of __cinit__
), then the integer sequence already is allocated, and just needs to be filled.
comment:54 follow-up: ↓ 55 Changed 7 years ago by
Sorry it that's a stupid question (I only glanced through the code), but why are you using mpz
's and not mpn
's? It looks like you are already doing most if not all memory management by hand...
comment:55 in reply to: ↑ 54 ; follow-up: ↓ 56 Changed 7 years ago by
Replying to mmezzarobba:
Sorry it that's a stupid question (I only glanced through the code), but why are you using
mpz
's and notmpn
's?
Simply because I did not succeed in finding what functions are available for mpn_t
. So, if you can give me a pointer (similar to the documentation for `mpz_t`), I'd appreciate. Probably non-signed types will be a tad faster than signed types.
It looks like you are already doing most if not all memory management by hand...
What would be the alternative to doing so? I mean, what I need is some tool to quickly operate on large bit arrays by shift operations, preferably also with a good and fast hash function, and without the need to think about the number of bits fitting into a byte or a long. I tried to implement this myself, using char*
or long*
(which would also mean to take care of allocating and freeing), but mpz_t
turns out to be faster.
So, if you know another tool, that is perhaps more directly dedicated to bit arrays, then please tell!
comment:56 in reply to: ↑ 55 ; follow-up: ↓ 57 Changed 7 years ago by
Replying to SimonKing:
Simply because I did not succeed in finding what functions are available for
mpn_t
. So, if you can give me a pointer (similar to the documentation for `mpz_t`), I'd appreciate.
The documentation for mpn_*
is at https://gmplib.org/manual/Low_002dlevel-Functions.html
It looks like you are already doing most if not all memory management by hand...
What would be the alternative to doing so?
I'm not saying I see one, only that, as far as I understand, automatic memory management is the main advantage of mpz_t
over mpn_*
when you are dealing with nonnegative integers (hence my first question).
comment:57 in reply to: ↑ 56 ; follow-up: ↓ 58 Changed 7 years ago by
Replying to mmezzarobba:
The documentation for
mpn_*
is at https://gmplib.org/manual/Low_002dlevel-Functions.html
Thank you, I'll have a look and will try to compare.
It looks like you are already doing most if not all memory management by hand...
What would be the alternative to doing so?
I'm not saying I see one, only that, as far as I understand, automatic memory management is the main advantage of
mpz_t
overmpn_*
when you are dealing with nonnegative integers (hence my first question).
Now I understand what you mean by your statement that I did all memory management by hand: I am not using mpz_init
but mpz_init2
, telling exactly how many bits I will need. I found that it has a noticeable impact on performance, since otherwise re-allocations will happen internally.
comment:58 in reply to: ↑ 57 Changed 7 years ago by
Replying to SimonKing:
Now I understand what you mean by your statement that I did all memory management by hand: I am not using
mpz_init
butmpz_init2
Yes, exactly. Sorry if I wasn't clear!
comment:59 Changed 7 years ago by
- Commit changed from 5dc78c5f5f647bf95b98da916d7086fae539ffb8 to efef80bea22840f618335b7dd6f5d3e9a64fa301
comment:60 Changed 7 years ago by
- Status changed from needs_review to needs_work
- Work issues set to Document cdef functions
I have at some point merged the latest beta. Sorry, I keep forgetting that merging stuff prematurely is bad. Main point was to relocate the code from sage.structure to sage.misc.
I don't think I will use mpn_*
. It just seems too low-level. For example, I see mpn_lshift
and mpn_rshift
operations, but they operate on limbs, not on bit arrays. Hence, in order to implement a bitshift on bit arrays (which is what I need in concatenation and many other operations), I'd still need to invest a lot of work, essentially what I have done with the proof-of-concept using long*
to store bit arrays. I doubt that in the end it would be any faster than using mpz_mul_2exp
and friends.
Documentation of the cdef functions is still missing, this is what I intend to do next.
comment:61 follow-up: ↓ 62 Changed 7 years ago by
If you use mpz functions, then getting/setting a single entry will be O(n) instead of O(1), and iterating over all entries or creating a sequence from a list of integers will be O(n2) instead of O(n). This makes the implementation rather useless for a lot of applications where a datatype like this would be useful (say if you want to store a million small integers in a memory-efficient way). With mpn functions, you can get the optimal complexity.
comment:62 in reply to: ↑ 61 Changed 7 years ago by
- Work issues changed from Document cdef functions to Document cdef functions. Pickling
Replying to fredrik.johansson:
If you use mpz functions, then getting/setting a single entry will be O(n) instead of O(1), and iterating over all entries or creating a sequence from a list of integers will be O(n2) instead of O(n).
Right, iteration, access to single items and creation from a list is not as fast as it could be. Can you point me to examples (in Sage? Elsewhere?) showeing how to use 'mpn_*`?
PS: I also forgot to implement pickling.
comment:63 Changed 7 years ago by
- Commit changed from efef80bea22840f618335b7dd6f5d3e9a64fa301 to 64ac7bab9d4c85136e9845878d69d82be7c300f8
Branch pushed to git repo; I updated commit sha1. New commits:
64ac7ba | Pickling of bounded integer sequence. Documentation of cdef functions
|
comment:64 Changed 7 years ago by
- Work issues changed from Document cdef functions. Pickling to Use mpn_* for speedup of iteration and item access
I have documented the cdef functions, and I have implemented pickling. Again (at least for long sequence), this is faster than for ordinary tuples:
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: L = [randint(0,26) for i in range(5000)] sage: S = BoundedIntegerSequence(32, L) sage: %timeit loads(dumps(S)) 1000 loops, best of 3: 407 µs per loop sage: T = tuple(S) sage: %timeit loads(dumps(T)) 100 loops, best of 3: 2.02 ms per loop
I hope there is a nice example for mpn_*
somewhere.
comment:65 Changed 7 years ago by
Hooray, using mpn_* improves the time to access item 2000 in an integer sequence of bound 32 from 1.49 µs to 545 ns. With tuples, the access time is 362 ns. So, that's almost competitive.
comment:66 Changed 7 years ago by
Another hooray: Conversion of a list of 5000 integers to a bounded integer sequence of bound 32 improved from 3.25 ms to 75.3 µs. Conversion to a tuple only takes 19.6 µs, but I doubt that this will be possible to beat.
comment:67 Changed 7 years ago by
- Commit changed from 64ac7bab9d4c85136e9845878d69d82be7c300f8 to 836f63fb96241033953e42d6d18becfaa4fe33b7
comment:68 Changed 7 years ago by
There remains to improve computation of the index of an item or of a sub-sequence, and slicing. When this is done, I'll provide updated benchmarks.
comment:69 Changed 7 years ago by
- Commit changed from 836f63fb96241033953e42d6d18becfaa4fe33b7 to c8a299ba54fa05d6851fd492019d9996e7e31515
Branch pushed to git repo; I updated commit sha1. New commits:
c8a299b | More documentation of bounded integer sequences
|
comment:70 Changed 7 years ago by
- Commit changed from c8a299ba54fa05d6851fd492019d9996e7e31515 to 735939e593c9c302ff42c4973cbfe2e48eb22644
comment:71 Changed 7 years ago by
- Status changed from needs_work to needs_review
Using the mpn_*
function was a very good idea. Some basic operations are a lot faster now. Here I am repeating (and slightly extending) my earlier benchmarks for those operations that have changed in the recent commits:
The test bed
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: L0 = [randint(0,7) for i in range(5)] sage: L1 = [randint(0,15) for i in range(15)] sage: L2 = [randint(0,31) for i in range(50)] sage: L3 = [randint(0,31) for i in range(5000)] sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3) sage: S0 = BoundedIntegerSequence(8, L0) sage: S1 = BoundedIntegerSequence(16, L1) sage: S2 = BoundedIntegerSequence(32, L2) sage: S3 = BoundedIntegerSequence(32, L3)
Conversion list -> tuple/sequence
sage: %timeit x = tuple(L0) 1000000 loops, best of 3: 307 ns per loop sage: %timeit x = tuple(L1) 1000000 loops, best of 3: 369 ns per loop sage: %timeit x = tuple(L2) 1000000 loops, best of 3: 534 ns per loop sage: %timeit x = tuple(L3) 10000 loops, best of 3: 19.6 µs per loop sage: %timeit x = BoundedIntegerSequence(8, L0) 1000000 loops, best of 3: 1.34 µs per loop sage: %timeit x = BoundedIntegerSequence(16, L1) 1000000 loops, best of 3: 1.57 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L2) 100000 loops, best of 3: 2.06 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L3) 10000 loops, best of 3: 76.7 µs per loop
Conversion to a tuple seems to be roughly 4 times faster. But I don't think this will be critical (at least not in my own applications...)
Conversion tuple/sequence -> list
sage: %timeit x = list(T0) 1000000 loops, best of 3: 537 ns per loop sage: %timeit x = list(T1) 1000000 loops, best of 3: 624 ns per loop sage: %timeit x = list(T2) 1000000 loops, best of 3: 752 ns per loop sage: %timeit x = list(T3) 100000 loops, best of 3: 22.6 µs per loop sage: %timeit x = list(S0) 1000000 loops, best of 3: 1.58 µs per loop sage: %timeit x = list(S1) 100000 loops, best of 3: 2.06 µs per loop sage: %timeit x = list(S2) 100000 loops, best of 3: 4.19 µs per loop sage: %timeit x = list(S3) 1000 loops, best of 3: 253 µs per loop sage: %timeit x = S0.list() 1000000 loops, best of 3: 1.05 µs per loop sage: %timeit x = S1.list() 1000000 loops, best of 3: 1.87 µs per loop sage: %timeit x = S2.list() 100000 loops, best of 3: 4.95 µs per loop sage: %timeit x = S3.list() 1000 loops, best of 3: 306 µs per loop
The gap between lists and sequences strongly depends on the bound of the sequence, which is not much of a surprise. Anyway, it is faster than before.
Slicing
For step 1, short tuples are faster than short sequences, but long sequences are faster than long tuples:
sage: timeit("x=T0[:-1]", number=100000) 100000 loops, best of 3: 479 ns per loop sage: timeit("x=T1[:-1]", number=100000) 100000 loops, best of 3: 548 ns per loop sage: timeit("x=T2[:-1]", number=100000) 100000 loops, best of 3: 773 ns per loop sage: timeit("x=T3[:-1]", number=100000) 100000 loops, best of 3: 19.6 µs per loop sage: timeit("x=S0[:-1]", number=100000) 100000 loops, best of 3: 1.75 µs per loop sage: timeit("x=S1[:-1]", number=100000) 100000 loops, best of 3: 1.8 µs per loop sage: timeit("x=S2[:-1]", number=100000) 100000 loops, best of 3: 1.77 µs per loop sage: timeit("x=S3[:-1]", number=100000) 100000 loops, best of 3: 2.4 µs per loop
As before, a step different from 1 is bad for sequences. However, there is an improvement compared with the previous timings:
sage: timeit("x=T2[:-1:2]", number=100000) 100000 loops, best of 3: 944 ns per loop sage: timeit("x=T3[:-1:2]", number=100000) 100000 loops, best of 3: 11.8 µs per loop sage: timeit("x=S2[:-1:2]", number=10000) 10000 loops, best of 3: 2.7 µs per loop sage: timeit("x=S3[:-1:2]", number=10000) 10000 loops, best of 3: 52.2 µs per loop
Accessing single items
Bounded integer sequences can now compete with tuples, both short and long!
sage: timeit("x=T0[1]", number=1000000) 1000000 loops, best of 3: 349 ns per loop sage: timeit("x=T0[4]", number=1000000) 1000000 loops, best of 3: 351 ns per loop sage: timeit("x=S0[1]", number=1000000) 1000000 loops, best of 3: 354 ns per loop sage: timeit("x=S0[4]", number=1000000) 1000000 loops, best of 3: 355 ns per loop sage: timeit("x=T3[1]", number=1000000) 1000000 loops, best of 3: 346 ns per loop sage: timeit("x=T3[4500]", number=1000000) 1000000 loops, best of 3: 347 ns per loop sage: timeit("x=S3[1]", number=1000000) 1000000 loops, best of 3: 350 ns per loop sage: timeit("x=S3[4500]", number=1000000) 1000000 loops, best of 3: 370 ns per loop
Containment tests
Here we talk about operations that do not exist in this form for tuples (e.g., subsequence tests). They have been improved by the latest commits:
sage: L0x = [randint(0,7) for i in range(5)] sage: L1x = [randint(0,15) for i in range(15)] sage: L2x = [randint(0,31) for i in range(50)] sage: L3x = [randint(0,31) for i in range(5000)] # verified that they differ from L0,L1,L2,L3 sage: S0x = BoundedIntegerSequence(8, L0x) sage: S1x = BoundedIntegerSequence(16, L1x) sage: S2x = BoundedIntegerSequence(32, L2x) sage: S3x = BoundedIntegerSequence(32, L3x) sage: S1y = BoundedIntegerSequence(16, L1) sage: S2y = BoundedIntegerSequence(32, L2) sage: S3y = BoundedIntegerSequence(32, L3) sage: S1y==S1!=S1x, S1y is not S1 (True, True) sage: S0z1 = S0+S0 sage: S0z2 = S0+S0x sage: S1z1 = S1+S1 sage: S1z2 = S1+S1x sage: S2z1 = S2+S2 sage: S2z2 = S2+S2x sage: S3z1 = S3+S3 sage: S3z2 = S3+S3x sage: timeit("S0z2.startswith(S0)", number=1000000) 1000000 loops, best of 3: 230 ns per loop sage: timeit("S0z2.startswith(S0x)", number=1000000) 1000000 loops, best of 3: 236 ns per loop sage: timeit("S0x in S0z2", number=1000000) 1000000 loops, best of 3: 511 ns per loop sage: timeit("S1z2.startswith(S1)", number=1000000) 1000000 loops, best of 3: 215 ns per loop sage: timeit("S1z2.startswith(S1x)", number=1000000) 1000000 loops, best of 3: 219 ns per loop sage: timeit("S1x in S1z2", number=1000000) 1000000 loops, best of 3: 692 ns per loop sage: timeit("S2z2.startswith(S2)", number=1000000) 1000000 loops, best of 3: 238 ns per loop sage: timeit("S2z2.startswith(S2x)", number=1000000) 1000000 loops, best of 3: 228 ns per loop sage: timeit("S2x in S2z2", number=1000000) 1000000 loops, best of 3: 1.92 µs per loop sage: timeit("S3z2.startswith(S3)", number=1000000) 1000000 loops, best of 3: 981 ns per loop sage: timeit("S3z2.startswith(S3x)", number=1000000) 1000000 loops, best of 3: 213 ns per loop sage: timeit("S3x in S3z2", number=1000) 1000 loops, best of 3: 2.41 ms per loop
Conclusion
- Slicing and iteration are still a weak point of bounded integer sequences. However, there is an improvement by the latest commits.
- Access to single items is now competitive. Some other operations have been competitive or superior before.
- Probably iteration and slicing could be improved with more complicated code: Currently, I take a single item, do a bitshift (which is faster than before!) and then proceed with the next item. But since several items usually fit into one "limb" (this is how GMP stores the data), it would be possible to accumulate a group of items into one limb, then do a single bitshift for this group of items, and then proceed to the next group of items. I would consider to implement it if my own applications show the need for it...
Needs review, I think! Please check if I produced a memory leak (by not freeing allocated temporary data), a memory corruption (by messing up in the case of items that are stored exactly on the seam between two limbs), or functions in which a memory error or keyboard interrupt would not be propagated. I plan to do such tests, too, but I suppose 4 or 6 eyes see more than 2.
comment:72 Changed 7 years ago by
- Status changed from needs_review to needs_work
- Work issues changed from Use mpn_* for speedup of iteration and item access to Fix flakyness in adding sequences
I found failures in some runs:
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx Running doctests with ID 2014-06-03-13-33-10-7f6bcfa0. Doctesting 1 file. sage -t src/sage/misc/bounded_integer_sequences.pyx ********************************************************************** File "src/sage/misc/bounded_integer_sequences.pyx", line 1059, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__ Failed example: S+T Expected: <4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 15> Got: <4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 31> ********************************************************************** 1 item had failures: 1 of 10 in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__ [178 tests, 1 failure, 0.79 s] ---------------------------------------------------------------------- sage -t src/sage/misc/bounded_integer_sequences.pyx # 1 doctest failed ---------------------------------------------------------------------- Total time for all tests: 0.9 seconds cpu time: 0.8 seconds cumulative wall time: 0.8 seconds king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx Running doctests with ID 2014-06-03-13-33-15-6cf37501. Doctesting 1 file. sage -t src/sage/misc/bounded_integer_sequences.pyx [178 tests, 0.78 s] ---------------------------------------------------------------------- All tests passed! ---------------------------------------------------------------------- Total time for all tests: 0.9 seconds cpu time: 0.8 seconds cumulative wall time: 0.8 seconds king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx Running doctests with ID 2014-06-03-13-33-20-d43d6691. Doctesting 1 file. sage -t src/sage/misc/bounded_integer_sequences.pyx ********************************************************************** File "src/sage/misc/bounded_integer_sequences.pyx", line 1061, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__ Failed example: T+S Expected: <4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 9> Got: <4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 25> ********************************************************************** 1 item had failures: 1 of 10 in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__ [178 tests, 1 failure, 0.78 s] ---------------------------------------------------------------------- sage -t src/sage/misc/bounded_integer_sequences.pyx # 1 doctest failed ---------------------------------------------------------------------- Total time for all tests: 0.9 seconds cpu time: 0.8 seconds cumulative wall time: 0.8 seconds king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx Running doctests with ID 2014-06-03-13-33-25-04f71350. Doctesting 1 file. sage -t src/sage/misc/bounded_integer_sequences.pyx [178 tests, 0.78 s] ---------------------------------------------------------------------- All tests passed! ---------------------------------------------------------------------- Total time for all tests: 0.9 seconds cpu time: 0.8 seconds cumulative wall time: 0.8 seconds king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx Running doctests with ID 2014-06-03-13-33-29-a2a81517. Doctesting 1 file. sage -t src/sage/misc/bounded_integer_sequences.pyx [178 tests, 0.79 s] ---------------------------------------------------------------------- All tests passed! ---------------------------------------------------------------------- Total time for all tests: 0.9 seconds cpu time: 0.8 seconds cumulative wall time: 0.8 seconds king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx Running doctests with ID 2014-06-03-13-33-33-e89cc5fe. Doctesting 1 file. sage -t src/sage/misc/bounded_integer_sequences.pyx [178 tests, 0.81 s] ---------------------------------------------------------------------- All tests passed! ---------------------------------------------------------------------- Total time for all tests: 0.9 seconds cpu time: 0.8 seconds cumulative wall time: 0.8 seconds king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx Running doctests with ID 2014-06-03-13-33-38-83426cf0. Doctesting 1 file. sage -t src/sage/misc/bounded_integer_sequences.pyx [178 tests, 0.79 s] ---------------------------------------------------------------------- All tests passed! ---------------------------------------------------------------------- Total time for all tests: 0.9 seconds cpu time: 0.8 seconds cumulative wall time: 0.8 seconds
My first guess is that this comes from using sage_malloc
in some places, i.e., I should zero the memory first.
comment:73 Changed 7 years ago by
There is something seriously broken, and apparently in the last limb of stuff:
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9]) sage: T = BoundedIntegerSequence(21, [4,1,6,2,8,15]) sage: S+T <4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 15> sage: T+S <4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 9> sage: S+T <4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 31>
My current guess is that in some place I get the number of limbs wrong when I manually assign it to the underlying mpz_t
.
comment:74 Changed 7 years ago by
Now I can explain what goes wrong in above example.
GMP has its limbs, and I have my sequence items. In the example, each item consumes 5 bit, but each limb comprises 32 bit. As it happens, the last bit of my last item is 0, and the last limb only stores this single zero bit. Hence, when I ask GMP to do high-level operations (which makes sense for concatenation, I wouldn't like to do it low-level), GMP tries to be clever and drops the last limb, since it is zero. But when I transform the result to a list, I want to access the last bit. But since the last bit is not in a limb any longer, it belongs to non-allocated memory. Hence, randomly, the last bit can be 0 or 1, which explains why in the example above the difference between the correct and the wrong last item is 16.
So, what I should do: When transforming a sequence to a string or list (or when iterating), I should check whether I am running outside of GMP's limbs. Alternatively, I should provide a bit 1 right behind of my last item, so that GMP will not drop the last limb (as I forced it to be non-zero).
comment:75 Changed 7 years ago by
Alternatively, I could replace all mpz_*
by mpn_*
. But that would be boring.
comment:76 Changed 7 years ago by
PS: Adding sequences that are constant zero exhibits the same problem.
comment:77 Changed 7 years ago by
I found examples that used to trigger the above-mentioned problems reliably. So, they will be added to the docs. I fixed the iteration problem, but I am still struggling with pickling.
comment:78 Changed 7 years ago by
Very odd. Could it be that I hit a bug in GMP in the conversion of an integer to a string representation at base 2^n
? Namely, in my examples, the 32-adic string representation used to pickle a bounded integer sequence is not representing the stored bit array, but the 31-adic or 10-adic string representation works fine (but probably is slower).
This seems to happen when the last limb used to store a GMP integer is in fact zero.
comment:79 Changed 7 years ago by
Yes, setting the _mp_size
field of __mpz_struct
in such a way that the limb _mp_d[_mp_size-1]
is non-zero solves the problem. I thought GMP would cope with "superfluous" zeroes, but in string representation it does not.
So, I have to check where I have been lazy with the _mp_size
field when using mpn_*
functions.
comment:80 Changed 7 years ago by
Current status is a struggle with sub-sequence recognition in the presence of trailing zeroes.
Here is the type of examples that I propose:
- Use a sequence of length 7 with a bound of 32. Then, we need 35 bit to store the sequence.
- At least on my machine, a GMP limb comprises 32 bit. Hence, the above sequence fits into two limbs, we only use three bit of the second limb.
- Choose the last item of the sequence such that it fits into 2 bit. Then, the three bit stored in the second limb are zero.
- Now, do an operation that involves a
mpz_*
function to create a new bounded integer sequence, only adding trailing zeroes. This currently is in concatenation and slicing. This may result in GMP cutting off the trailing zeroes.
When we then do operations involving mpn_*
functions, it is needed to take care of the cut off bits (and treat them as zero). My current private branch (not pushed yet) involves several tests that are failing with the branch currently attached here, but succeed in my current branch.
Still failing is this:
sage: X = BoundedIntegerSequence(21, [4,1,6,2,7,2,3]) sage: S = BoundedIntegerSequence(21, [0,0,0,0,0,0,0]) sage: loads(dumps(X+S)) <4, 1, 6, 2, 7, 2, 3, 0, 0, 0, 0, 0, 0, 0> sage: loads(dumps(X+S)) == X+S True sage: T = BoundedIntegerSequence(21, [0,4,0,1,0,6,0,2,0,7,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0]) sage: T[3::2]==(X+S)[1:] True sage: T[3::2] in X+S False # should return True!
EDIT: Recall that providing bound 21 will internally changed into the next power of two, hence, 32.
comment:81 Changed 7 years ago by
- Commit changed from 735939e593c9c302ff42c4973cbfe2e48eb22644 to c900a2cd4045aa3463be31d76c9f047b05571958
Branch pushed to git repo; I updated commit sha1. New commits:
c900a2c | Take care of GMP's removal of trailing zeroes
|
comment:82 Changed 7 years ago by
- Status changed from needs_work to needs_review
- Work issues Fix flakyness in adding sequences deleted
I solved the problems with wrong sizes / trailing zeroes. I added all tests that used to fail with previous versions as doc tests. All tests are passing now. Hence: Needs review again. Tomorrow, I'll provide new timings, as the new sanity checks may result in a small slow-down. We will see...
comment:83 follow-up: ↓ 84 Changed 7 years ago by
Repeating the tests, in each case (even when the underlying function has not changed) stating the timing of the old code:
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: L0 = [randint(0,7) for i in range(5)] sage: L1 = [randint(0,15) for i in range(15)] sage: L2 = [randint(0,31) for i in range(50)] sage: L3 = [randint(0,31) for i in range(5000)] sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3) sage: S0 = BoundedIntegerSequence(8, L0) sage: S1 = BoundedIntegerSequence(16, L1) sage: S2 = BoundedIntegerSequence(32, L2) sage: S3 = BoundedIntegerSequence(32, L3) sage: %timeit x = BoundedIntegerSequence(8, L0) 1000000 loops, best of 3: 1.4 µs per loop # was 1.34 µs sage: %timeit x = BoundedIntegerSequence(16, L1) 1000000 loops, best of 3: 1.59 µs per loop # was 1.57 µs sage: %timeit x = BoundedIntegerSequence(32, L2) 100000 loops, best of 3: 2.13 µs per loop # was 2.06 µs sage: %timeit x = BoundedIntegerSequence(32, L3) 10000 loops, best of 3: 76.3 µs per loop # was 76.7 µs
It surprises me that some of the following became faster:
sage: %timeit x = list(S0) 1000000 loops, best of 3: 1.55 µs per loop # was 1.58 µs sage: %timeit x = list(S1) 100000 loops, best of 3: 2.14 µs per loop # was 2.06 µs sage: %timeit x = list(S2) 100000 loops, best of 3: 4.31 µs per loop # was 4.19 µs sage: %timeit x = list(S3) 1000 loops, best of 3: 274 µs per loop # was 253 µs sage: %timeit x = S0.list() 1000000 loops, best of 3: 810 ns per loop # was 1.05 µs sage: %timeit x = S1.list() 1000000 loops, best of 3: 1.22 µs per loop # was 1.87 µs sage: %timeit x = S2.list() 100000 loops, best of 3: 2.9 µs per loop # was 4.95 µs sage: %timeit x = S3.list() 10000 loops, best of 3: 105 µs per loop # was 306 µs
sage: timeit("x=S0[:-1]", number=100000) 100000 loops, best of 3: 1.76 µs per loop # was 1.75 µs sage: timeit("x=S1[:-1]", number=100000) 100000 loops, best of 3: 1.76 µs per loop # was 1.8 µs sage: timeit("x=S2[:-1]", number=100000) 100000 loops, best of 3: 1.78 µs per loop # was 1.77 µs sage: timeit("x=S3[:-1]", number=100000) 100000 loops, best of 3: 2.46 µs per loop # was 2.4 µs
sage: timeit("x=S2[:-1:2]", number=10000) 10000 loops, best of 3: 2.68 µs per loop # was 2.7 µs sage: timeit("x=S3[:-1:2]", number=10000) 10000 loops, best of 3: 52.2 µs per loop # was 52.2 µs
The following apparently became slower:
sage: timeit("x=S0[1]", number=1000000) 1000000 loops, best of 3: 370 ns per loop # was 354 ns sage: timeit("x=S0[4]", number=1000000) 1000000 loops, best of 3: 375 ns per loop # was 355 ns sage: timeit("x=S3[1]", number=1000000) 1000000 loops, best of 3: 371 ns per loop # was 350 ns sage: timeit("x=S3[4500]", number=1000000) 1000000 loops, best of 3: 391 ns per loop # was 370 ns
sage: L0x = [randint(0,7) for i in range(5)] sage: L1x = [randint(0,15) for i in range(15)] sage: L2x = [randint(0,31) for i in range(50)] sage: L3x = [randint(0,31) for i in range(5000)] # verified that they differ from L0,L1,L2,L3 sage: S0x = BoundedIntegerSequence(8, L0x) sage: S1x = BoundedIntegerSequence(16, L1x) sage: S2x = BoundedIntegerSequence(32, L2x) sage: S3x = BoundedIntegerSequence(32, L3x) sage: S1y = BoundedIntegerSequence(16, L1) sage: S2y = BoundedIntegerSequence(32, L2) sage: S3y = BoundedIntegerSequence(32, L3) sage: S1y==S1!=S1x, S1y is not S1 (True, True) sage: S0z1 = S0+S0 sage: S0z2 = S0+S0x sage: S1z1 = S1+S1 sage: S1z2 = S1+S1x sage: S2z1 = S2+S2 sage: S2z2 = S2+S2x sage: S3z1 = S3+S3 sage: S3z2 = S3+S3x sage: timeit("S0z2.startswith(S0)", number=1000000) 1000000 loops, best of 3: 233 ns per loop # was 230 ns sage: timeit("S0z2.startswith(S0x)", number=1000000) 1000000 loops, best of 3: 238 ns per loop # was 236 ns sage: timeit("S0x in S0z2", number=1000000) 1000000 loops, best of 3: 457 ns per loop # was 511 ns sage: timeit("S1z2.startswith(S1)", number=1000000) 1000000 loops, best of 3: 219 ns per loop # was 215 ns sage: timeit("S1z2.startswith(S1x)", number=1000000) 1000000 loops, best of 3: 223 ns per loop # was 219 ns sage: timeit("S1x in S1z2", number=1000000) 1000000 loops, best of 3: 813 ns per loop # was 692 ns sage: timeit("S2z2.startswith(S2)", number=1000000) 1000000 loops, best of 3: 244 ns per loop # was 238 ns sage: timeit("S2z2.startswith(S2x)", number=1000000) 1000000 loops, best of 3: 223 ns per loop # was 228 ns sage: timeit("S2x in S2z2", number=1000000) 1000000 loops, best of 3: 5.59 µs per loop # was 1.92 µs sage: timeit("S3z2.startswith(S3)", number=1000000) 1000000 loops, best of 3: 990 ns per loop # was 981 ns sage: timeit("S3z2.startswith(S3x)", number=1000000) 1000000 loops, best of 3: 216 ns per loop # was 213 ns sage: timeit("S3x in S3z2", number=1000) 1000 loops, best of 3: 2.33 ms per loop # was 2.41 ms
So, overall, I think that the timings are still decent and it makes sense to use bounded integer sequences as replacement of tuples in some applications.
comment:84 in reply to: ↑ 83 Changed 7 years ago by
So, overall, I think that the timings are still decent and it makes sense to use bounded integer sequences as replacement of tuples in some applications.
Especially since they are probably MUCH more compact in memory.
Nathann
comment:85 Changed 7 years ago by
- Commit changed from c900a2cd4045aa3463be31d76c9f047b05571958 to 1192c211965ea1dcf1fdbaeaec7e68bf61ba265e
Branch pushed to git repo; I updated commit sha1. New commits:
1192c21 | Allow empty slices; bounded integer sequence -> bool
|
comment:86 Changed 7 years ago by
I just fixed a segfault that has occurred for empty slices, and I implemented conversion sequence to bool (in the usual way: A sequence is nonzero if and only if it has a positive length). Still needs review...
comment:87 Changed 7 years ago by
- Status changed from needs_review to needs_work
- Work issues set to Fix other out of bound errors
Sigh. I hate trailing zeroes.
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: (BoundedIntegerSequence(21, [0,0]) + BoundedIntegerSequence(21, [0,0])) <16, 0, 30, 30>
comment:88 Changed 7 years ago by
My original plan was to provide a function to compute the maximal overlap of sequences in the follow-up ticket #16453. However, while I am at fixing bugs, I'll move this new function to here.
comment:89 Changed 7 years ago by
- Commit changed from 1192c211965ea1dcf1fdbaeaec7e68bf61ba265e to ff4477a1600b44031c2fb962ff3c5ab24c110410
Branch pushed to git repo; I updated commit sha1. New commits:
ff4477a | Remove biseq_to_str. Add max_overlap. Fix boundary violations
|
comment:90 Changed 7 years ago by
- Status changed from needs_work to needs_review
Done!
Nathann, while I was at it, I have removed the biseq_to_str
function, as we have discussed off-trac.
comment:91 Changed 7 years ago by
- Status changed from needs_review to needs_work
- Work issues changed from Fix other out of bound errors to Fix containement test
Again, there is trouble with zeroes. The sequence <0>
is currently considered a subsequence of the sequence <1,2,3>
:
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: S1 = BoundedIntegerSequence(3,[1,3]) sage: S2 = BoundedIntegerSequence(3,[0]) sage: S2 in S1 # very wrong True sage: S1.startswith(S2) # correct False
comment:92 Changed 7 years ago by
It seems that the problem lies in mpz_congruent_2exp_p(A, B, nbits)
: In the problematic situation, A==0
, B==13
and nbits==2
. It should be the case that the function returns "False", since 13 is not congruent to 0 modulo 4 (i.e. modulo two bits). But it doesn't. And that's totally weird. I'll try to reproduce it independent of the bounded integer sequences, i.e., I will see if it is a horribly GMP bug. Probably it isn't.
comment:93 Changed 7 years ago by
Yes, it isn't:
sage: cython(""" ....: include "sage/libs/ntl/decl.pxi" ....: def test(x,y): ....: cdef mpz_t a,b ....: mpz_init_set_ui(a, x) ....: mpz_init_set_ui(b, y) ....: print mpz_congruent_2exp_p(a,b,2) ....: print mpz_get_ui(a), "vs", mpz_get_ui(b) ....: print (<__mpz_struct*>a)._mp_size ....: print (<__mpz_struct*>b)._mp_size ....: mpz_clear(a) ....: mpz_clear(b) ....: """) sage: test(0,13) False 0 vs 13 0 1
My current impression is that in the same situation as above, mpz_congruent_2exp_p
gives the wrong result (True
) in the bounded integer sequence code. Strange.
comment:94 Changed 7 years ago by
Argh, now I see the difference: By a bug, I somehow managed to give the number 13 the size zero. mpz_congruent_2exp_p thus believes that 13==0 and correctly concludes that 0 is congruent 13 modulo 4.
comment:95 Changed 7 years ago by
- Commit changed from ff4477a1600b44031c2fb962ff3c5ab24c110410 to 93546db866564a80d80df30de48934697e2b0d1d
comment:96 Changed 7 years ago by
- Status changed from needs_work to needs_review
- Work issues Fix containement test deleted
The problem is fixed. I inserted comments into the code that hopefully explain what happens in the subsequence containment test.
Doctests pass (including a new test that has previously failed), hence, it needs review again!
comment:97 Changed 7 years ago by
- Milestone changed from sage-6.3 to sage-6.4
comment:98 follow-up: ↓ 99 Changed 7 years ago by
- Status changed from needs_review to needs_work
still two failing doctests, see patchbot report
comment:99 in reply to: ↑ 98 Changed 7 years ago by
Replying to chapoton:
still two failing doctests, see patchbot report
This is all very odd. All pathchbots report the same error, but I don't see it on my own machine. So far, I don't know where I could start debugging.
comment:100 Changed 7 years ago by
Is it perhaps the case that something relevant (GMP) was upgraded after sage-6.3.beta5? That's the version that I use on my laptop.
comment:101 Changed 7 years ago by
Currently I won't upgrade to sage-6.4, as I am rather busy with another project. Could one of you please test if the error appears in sage-6.4 but not in sage-6.3?
comment:102 Changed 7 years ago by
I will test on 6.4.beta1. Building right now.
comment:103 Changed 7 years ago by
I confirm the doctest failures on 6.4.beta1:
sage -t --long src/sage/misc/bounded_integer_sequences.pyx ********************************************************************** File "src/sage/misc/bounded_integer_sequences.pyx", line 1086, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__getitem__ Failed example: S[1::2] Expected: <1, 2, 20> Got: <1, 6, 23> ********************************************************************** File "src/sage/misc/bounded_integer_sequences.pyx", line 1088, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__getitem__ Failed example: S[-1::-2] Expected: <9, 7, 6, 4> Got: <9, 7, 22, 15>
comment:104 Changed 7 years ago by
Interesting!
So, what has changed between sage-6.3.b5 and sage-6.4.b1? Specifically, has something happened with GMP?
comment:105 Changed 7 years ago by
PS: Can you reproduce it on the command line?
At some point I should upgrade, to fix the issue...
comment:106 Changed 7 years ago by
Yes, same behaviour in the command line.
comment:107 Changed 7 years ago by
I have upgraded now, and I do not see the error.
So, what could we do to trace the problem down?
comment:108 Changed 7 years ago by
Since I cannot reproduce the problem myself, could you please help me by providing some internal data? Perhaps there are different architectures (32bit or 64bit, big or little endian) at work?
Please add the following as a method of BoundedIntegerSequence
:
def inspect(self): cdef __mpz_struct seq seq = deref(<__mpz_struct*>self.data.data) print "bitsize",self.data.bitsize print "itembitsize",self.data.itembitsize print "mask_item", Integer(self.data.mask_item).binary() print "length", self.data.length print "GMP size", seq._mp_size print "GMP alloc", seq._mp_alloc print '.'.join([Integer(seq._mp_d[limb]).binary().rjust(mp_bits_per_limb,'0') for limb in range(seq._mp_size-1,-1,-1)])
With this, I get
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9]) sage: S.inspect() bitsize 35 itembitsize 5 mask_item 11111 length 7 GMP size 2 GMP alloc 3 00000000000000000000000000000010.01101000011100010001100000100100 sage: S[1::2] <1, 2, 20> sage: (S[1::2]).inspect() bitsize 15 itembitsize 5 mask_item 11111 length 3 GMP size 1 GMP alloc 2 00000000000000000101000001000001 sage: S[-1::-2] <9, 7, 6, 4> sage: (S[-1::-2]).inspect() bitsize 20 itembitsize 5 mask_item 11111 length 4 GMP size 1 GMP alloc 2 00000000000000100001100011101001
What do you get instead?
comment:109 Changed 7 years ago by
got that :
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9]) sage: S.inspect() bitsize 35 itembitsize 5 mask_item 11111 length 7 GMP size 1 GMP alloc 2 0000000000000000000000000000001001101000011100010001100000100100 sage: S[1::2] <1, 6, 23> sage: (S[1::2]).inspect() bitsize 15 itembitsize 5 mask_item 11111 length 3 GMP size 1 GMP alloc 2 0000000000000000000000000000000000010011110111111101110011000001 sage: S[-1::-2] <9, 7, 22, 15> sage: (S[-1::-2]).inspect() bitsize 20 itembitsize 5 mask_item 11111 length 4 GMP size 1 GMP alloc 2 0000000000000001001101000011101011101100011101111101100011101001
comment:110 Changed 7 years ago by
Thank you! So, one obvious difference is that on your computer, GMP uses chunks of 64 bit to store long integers, whereas it is only 32 bit on my machine (even though it is a 64 bit CPU). I'll see if I can make sense of the error.
comment:111 Changed 7 years ago by
Another interesting point is that the single limb used to store S[1::2]
resp. S[-1::-2]
on your machine is full of junk bits.
Anyway, what surprises me is the following: Up to know, all problems arose because of a misfit where two limbs meet. But here, for the first time, we have errors occurring in a single limb.
comment:112 Changed 7 years ago by
Hooray, I can reproduce the problem on my machine, with a different example!
The key is (as on your machine) to create a sequence that fits inside a single limb, an then slice:
sage: S = BoundedIntegerSequence(8, [4,1,6,2,7,2,5,5,2]) sage: S <4, 1, 6, 2, 7, 2, 5, 5, 2> sage: S[1::2] <1, 6, 7, 7> sage: S[-1::-2] <2, 5, 7, 6, 7>
The sequence above is of length 9, with items fitting into 3 bit. Hence, 27 bit are used, whereas one limb comprises 32 bits.
So, stuff for debugging!
comment:113 Changed 7 years ago by
Got it!
In some branches of the slice_biseq
function, I forgot to restrict the to-be-copied data down to the bitsize of items (i.e., I forgot ...&S.mask_item
). This oversight only happened in branches dealing with the case that everything fits within a single limb. Hence, no error in the old doctest for me (because it comprises two limbs on my machine), but an error for you (because it comprises a single limb on your machine).
comment:114 Changed 7 years ago by
- Commit changed from 93546db866564a80d80df30de48934697e2b0d1d to c69c67ccf3093a3c947ebbe83d5224daf695c3f9
comment:115 Changed 7 years ago by
- Status changed from needs_work to needs_review
The fix is pushed, and of course the example that was failing on my machine has become a new doctest.
comment:116 Changed 7 years ago by
- Commit changed from c69c67ccf3093a3c947ebbe83d5224daf695c3f9 to b331dc4f12e7291e9537f607ed155085cabc3fcd
Branch pushed to git repo; I updated commit sha1. New commits:
b331dc4 | Fix mem leak converting zero-valued biseq_t to list
|
comment:117 Changed 7 years ago by
I have fixed a memory leak: In the function biseq_to_list
, some temporary buffer was allocated, but in the case that the sequence is zero valued, the function returns without freeing the buffer. Solution: Only allocate the buffer after checking that the sequence is not zero valued.
Still needing review (hint-hint)...
comment:118 Changed 7 years ago by
- Status changed from needs_review to needs_work
- Work issues set to memory corruption
I found that the following function (written into bouned_integer_sequences.pyx
) achieves to crash sage:
def _biseq_stresstest(): cdef int i from sage.misc.prandom import randint cdef list L = [BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))]) for y in range(100)] cdef int branch cdef BoundedIntegerSequence S, T for i from 0<=i<10000: if (i%1000) == 0: print i branch = randint(0,4) if branch == 0: L[randint(0,99)] = L[randint(0,99)]+L[randint(0,99)] elif branch == 1: x = randint(0,99) if len(L[x]): y = randint(0,len(L[x])-1) z = randint(y,len(L[x])-1) L[randint(0,99)] = L[x][y:z] else: L[x] = BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))]) elif branch == 2: t = list(L[randint(0,99)]) t = repr(L[randint(0,99)]) t = L[randint(0,99)].list() elif branch == 3: x = randint(0,99) if len(L[x]): t = L[x][randint(0,len(L[x])-1)] try: t = L[x].index(t) except ValueError: print L[x] print t raise else: L[x] = BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))]) elif branch == 4: S = L[randint(0,99)] T = L[randint(0,99)] startswith_biseq(S.data,T.data) contains_biseq(S.data, T.data, 0) max_overlap_biseq(S.data, T.data)
Can someone please see what happens on other machines? On mine (with the above function, but also with something that was supposed to be the corner stone in my current project), crashes occur in sage_free
or mpz_clear
. Surprisingly, the variables-to-be-freed/cleared should be just fine, in some cases they haven't even been touched after allocation.
My wild guess is that using sage_malloc
(in bounded_integer_sequences.pyx
) somehow interferes with gmp's (mpir's?) memory management. Even if the reason is different, I suppose it would make sense to get rid of sage_malloc
and sage_free
in my code. After all, I use it for buffers whose length is known at compile time. Hence, I should better use arrays.
So, my plan is: Use arrays instead of sage_malloc
, and hope that this is enough to let above function pass, which would then become a doctest.
comment:119 follow-up: ↓ 120 Changed 7 years ago by
Hmm. Using, for example,
cdef mp_limb_t tmp_limb[1]
instead of
cdef mp_limb_t *tmp_limb tmp_limb = <mp_limb_t*>sage_malloc(sizeof(mp_limb_t))
compiles, but then it seems that the mpn_*
functions do not correctly put values into the arrays, when I give them tmp_limb
as argument.
Question to C/Cython experts: Is tmp_limb
not just of type <mp_limb_t*>
when I define cdef mp_limb_t tmp_limb[1]
?
comment:120 in reply to: ↑ 119 ; follow-up: ↓ 121 Changed 7 years ago by
Replying to SimonKing:
Question to C/Cython experts: Is
tmp_limb
not just of type<mp_limb_t*>
when I definecdef mp_limb_t tmp_limb[1]
?
Forget the question. The problem was somewhere else. Now I think I can make arrays works as replacement for the current usage of sage_malloc
.
comment:121 in reply to: ↑ 120 Changed 7 years ago by
Replying to SimonKing:
Replying to SimonKing:
Question to C/Cython experts: Is
tmp_limb
not just of type<mp_limb_t*>
when I definecdef mp_limb_t tmp_limb[1]
?Forget the question. The problem was somewhere else. Now I think I can make arrays works as replacement for the current usage of
sage_malloc
.
Do not forget the question. There is something extremely strange going on. When I insert a print statement after assigning the size of an __mpz_struct*
, then things work. Without the print statement, the assignment of the size is ignored and the value incorrectly left zero.
So, how can a print statement influence whether or not some value of a variable in GMP is assigned?
comment:122 Changed 7 years ago by
- Commit changed from b331dc4f12e7291e9537f607ed155085cabc3fcd to 7b53dc82d032d5a787e10731938f9048b620031b
Branch pushed to git repo; I updated commit sha1. New commits:
7b53dc8 | Try to improve memory management for biseq_t, and add a stress test
|
comment:123 Changed 7 years ago by
I have replaced sage_malloc
by arrays, since the array size is known at compile time. I think this is a good idea anyway, and the doctests pass.
But not all is good, and here I need help: _biseq_stresstest()
either crashes with free(): invalid next size (fast)
, or it raises a value error stating that some element is not in a bounded integer sequence, even though by construction it is contained. So, memory management still goes havoc.
Any clues or pointers?
comment:124 Changed 7 years ago by
I was running it under valgrind---and as a result, I did not see a crash! However, the other problem (a value error when a sequence is supposed to return the index of one of its elements) still persists under valgrind.
Concerning the crash: What can one do to further debug if a crash can not be reproduced with valgrind?
comment:125 Changed 7 years ago by
PS: I can not quit sage when I run it under valgrind---"quit" has no effect!
comment:126 Changed 7 years ago by
Even "pkill python" did not achieve to finish the valgrinded Sage session. Boy, am I fed up now. Time to call it a day.
comment:127 Changed 7 years ago by
Actually I realise that I made a severe mistake all the time. concat_biseq
returns a pointer to a local variable, and I did so in order to make it possible to propagate errors (by saying ... except NULL
). But returning a pointer to a local variable can not possibly work.
Since I want to propagate errors, I suppose it will be better to consistently work with pointers. I'll rewrite the module accordingly...
comment:128 Changed 7 years ago by
It seems that that was the reason indeed! After rewriting everything so that
biseq_t
becomes a pointer to a struct (before, biseq_t
was the
struct), all doctests pass, and the stresstest doesn't crash.
But not all is good: The stresstest doesn't crash, but sporadically it finds can not find the index of an element of a sequence. So, needs work and I will not push yet, but it is almost done.
comment:129 Changed 7 years ago by
The stresstest found that the following fails:
sage: S = BoundedIntegerSequence(6, [3, 5, 3, 1, 5, 2, 2, 5, 3, 3, 4]) sage: S[-1] 4
Currently it returns 0.
comment:130 Changed 7 years ago by
- Commit changed from 7b53dc82d032d5a787e10731938f9048b620031b to 0aa3cbcfb1be64154ad32a419f250a9c6e52dcf2
comment:131 Changed 7 years ago by
The stress test is really valuable. I found several small problems concerning corner cases. And, as I have announced, I have reworked the code so that only "proper" pointers rather than stupid references to local variables are passed around.
There remain problems, though. In the doctests, I have commented out the stresstest, because there are STILL crashes. This time, mpz_clear
sometimes doesn't work. My wild guess is that at some point I write out of bounds when using mpn_*
-functions.
New commits:
7c9f0f2 | Biseq refactored, so that only pointers are passed around.
|
0aa3cbc | Fix corner cases in item access for bounded integer sequences
|
comment:132 Changed 7 years ago by
Hooray, by inserting explicit range checks, I found that I am indeed writing out of allocated area in slice_biseq
. That (and similar errors?) will hopefully explain the crashes...
comment:133 Changed 7 years ago by
- Commit changed from 0aa3cbcfb1be64154ad32a419f250a9c6e52dcf2 to f20dc096832f260edee2f28b3007bf42263f54ac
Branch pushed to git repo; I updated commit sha1. New commits:
f20dc09 | Fix writing out of bounds, and assert that the bounds are respected
|
comment:134 follow-up: ↓ 135 Changed 7 years ago by
- Status changed from needs_work to needs_review
In the new commit, I added assertions to make sure that I am not writing into non-allocated memory. So, the code should now be water proof in that regard.
All tests pass, including the "stress test". So, it can be reviewed now!
Question to the reviewer: Can the assertions be removed? After all, what is being asserted is just that some variables are correctly computed, but this can in principle be checked by reading the code. Or are assertions (even in relatively tight loops) cheap enough?
comment:135 in reply to: ↑ 134 Changed 7 years ago by
Replying to SimonKing:
Question to the reviewer: Can the assertions be removed? After all, what is being asserted is just that some variables are correctly computed, but this can in principle be checked by reading the code. Or are assertions (even in relatively tight loops) cheap enough?
If you run Python
with the option -O
, all the assert calls are ignored.
Something similar should be doable, at compile time of course, for Cython.
Most programming languages support this to encourage developpers to never remove asserts. We really should have this feature in Sage. A good starting point would be to have sage -O
call Python -O
.
Cheers,
Nicolas
comment:136 Changed 7 years ago by
- Commit changed from f20dc096832f260edee2f28b3007bf42263f54ac to cea38bb0a7d6f748cfc1feef357df5e22eb7b652
Branch pushed to git repo; I updated commit sha1. New commits:
cea38bb | Change doc according to the changed functionality of list_to_biseq
|
comment:137 Changed 7 years ago by
- Work issues memory corruption deleted
comment:138 Changed 7 years ago by
- Commit changed from cea38bb0a7d6f748cfc1feef357df5e22eb7b652 to 4611f8aa591616f118b50ebc2d2516c635e40318
Branch pushed to git repo; I updated commit sha1. New commits:
4611f8a | Minor changes in the docs
|
comment:139 Changed 7 years ago by
- Commit changed from 4611f8aa591616f118b50ebc2d2516c635e40318 to 815d77c77f48605ea33b87a88120aaabadfb5f7f
Branch pushed to git repo; I updated commit sha1. New commits:
815d77c | mpn_r/lshift should only be used with strictly positive shift
|
comment:140 Changed 7 years ago by
This is to explain the most recent commit: By doing some non-commutative Gröbner basis computations relying on "biseq_t", I obtained crashes apparently caused by the slice_biseq
function. It is using mpn_rshift
, and it did so even when the shift was zero. However, I found in the docs that the shift is supposed to be at least 1. Hence, when there is no shift, I am now using mpn_copyi
instead. Hope it helps to prevent the crash...
comment:141 Changed 7 years ago by
Argh. No, it did not fix the crash in my Gröbner basis application. Anyway, I still think the commit makes sense.
comment:142 Changed 7 years ago by
- Commit changed from 815d77c77f48605ea33b87a88120aaabadfb5f7f to 6dfb1cb1c6dc5704e2a8918cf2aeabdd8dbef69a
Branch pushed to git repo; I updated commit sha1. New commits:
6dfb1cb | Make contains_biseq interruptible and add to its doc
|
comment:143 follow-up: ↓ 144 Changed 7 years ago by
I'm getting three valgrind warnings (#15586):
sage: sage: from sage.misc.bounded_integer_sequences import _biseq_stresstest sage: _biseq_stresstest() ==3937== Invalid read of size 8 ==3937== at 0x30A524B0: __gmpn_rshift (rshift.c:55) ==3937== by 0x48CCB50B: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2611) ==3937== by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902) ==3937== by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038) ==3937== by 0x4F3E768: call_function (ceval.c:4017) ==3937== by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679) [...] ==3937== Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd ==3937== at 0x4C29BCF: malloc (vg_replace_malloc.c:296) ==3937== by 0x142EC8FD: sage_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so) ==3937== by 0x142EC9E4: sage_gmp_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so) ==3937== by 0x30A439EE: __gmpz_init2 (init2.c:33) ==3937== by 0x48CCE302: __pyx_f_4sage_4misc_25bounded_integer_sequences_concat_biseq (bounded_integer_sequences.c:2256) ==3937== by 0x48CD6806: __pyx_pf_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_31__add__ (bounded_integer_sequences.c:7201) ==3937== by 0x48CD6806: __pyx_pw_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_32__add__ (bounded_integer_sequences.c:7078) ==3937== by 0x4E80BC8: binary_op1 (abstract.c:945) ==3937== by 0x4E80BC8: PyNumber_Add (abstract.c:1185) ==3937== by 0x48CD39AD: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8281) ==3937== by 0x48CD39AD: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038) ==3937== by 0x4F3E768: call_function (ceval.c:4017) ==3937== by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679) [...] ==3937== Conditional jump or move depends on uninitialised value(s) ==3937== at 0x48CCB5C3: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2828) ==3937== by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902) ==3937== by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038) ==3937== by 0x4F3E768: call_function (ceval.c:4017) ==3937== by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679) ==3937== by 0x4F3F24F: PyEval_EvalCodeEx (ceval.c:3265) ==3937== by 0x4F3F378: PyEval_EvalCode (ceval.c:673)
The middle one is almost certainly an error, the other two don't look too good either.
comment:144 in reply to: ↑ 143 Changed 7 years ago by
Good that I added the stress test!
Replying to vbraun:
I'm getting three valgrind warnings (#15586):
sage: sage: from sage.misc.bounded_integer_sequences import _biseq_stresstest sage: _biseq_stresstest() ==3937== Invalid read of size 8 ==3937== at 0x30A524B0: __gmpn_rshift (rshift.c:55) ==3937== by 0x48CCB50B: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2611) ==3937== by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902) ==3937== by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038) ==3937== by 0x4F3E768: call_function (ceval.c:4017) ==3937== by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679) [...] ==3937== Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd ==3937== at 0x4C29BCF: malloc (vg_replace_malloc.c:296) ==3937== by 0x142EC8FD: sage_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so) ==3937== by 0x142EC9E4: sage_gmp_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so) ==3937== by 0x30A439EE: __gmpz_init2 (init2.c:33) ==3937== by 0x48CCE302: __pyx_f_4sage_4misc_25bounded_integer_sequences_concat_biseq (bounded_integer_sequences.c:2256) ==3937== by 0x48CD6806: __pyx_pf_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_31__add__ (bounded_integer_sequences.c:7201) ==3937== by 0x48CD6806: __pyx_pw_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_32__add__ (bounded_integer_sequences.c:7078) ==3937== by 0x4E80BC8: binary_op1 (abstract.c:945) ==3937== by 0x4E80BC8: PyNumber_Add (abstract.c:1185) ==3937== by 0x48CD39AD: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8281) ==3937== by 0x48CD39AD: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038) ==3937== by 0x4F3E768: call_function (ceval.c:4017) ==3937== by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679) [...]
Do I understand correctly: It means that contains_biseq
could read and concat_biseq
could write stuff in non-allocated memory?
==3937== Conditional jump or move depends on uninitialised value(s) ==3937== at 0x48CCB5C3: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2828) ==3937== by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902) ==3937== by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038) ==3937== by 0x4F3E768: call_function (ceval.c:4017) ==3937== by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679) ==3937== by 0x4F3F24F: PyEval_EvalCodeEx (ceval.c:3265) ==3937== by 0x4F3F378: PyEval_EvalCode (ceval.c:673)The middle one is almost certainly an error, the other two don't look too good either.
Will the given line numbers (2611, 2256, 2828) be the same on my machine?
comment:145 follow-up: ↓ 152 Changed 7 years ago by
I think they are, but I wouldn't bet anything on it ;-)
comment:146 follow-up: ↓ 147 Changed 7 years ago by
1) is contains_biseq
reading into unallocated memory, and 2) says that the invalid read is just beyond what concat_biseq
allocated.
comment:147 in reply to: ↑ 146 Changed 7 years ago by
Replying to vbraun:
1) is
contains_biseq
reading into unallocated memory, and 2) says that the invalid read is just beyond whatconcat_biseq
allocated.
Sigh. Could be that I even don't see where 1), 2) and 3) are in the valgrind log...
Concerning the invalid read of size 8 by __gmpn_rshift
: I think I recall that at some point I wanted to avoid considering cases and thus allowed to rshift memory that partially is uninitialised, since after the shift I would only access the bytes that were allocated. Probably this is improper and should be fixed.
Concerning Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd
: I thought this means I am writing into non-allocated memory. But by your new comment, I understand that this just indicates where the memory accessed by the previous item (__gmpn_rshift
) was allocated. I guess I need an explanation.
Concerning Conditional jump or move depends on uninitialised value(s)
: Is this independent of the two previous points? Or does it concern the same example and the same memory location?
comment:148 follow-up: ↓ 149 Changed 7 years ago by
With the updated valgrind they are the only errors while running the testsuite(!). Trivial to spot. Its possible that they are false positives if you use only certain bit positions, but zeroing out the memory doesn't cost much in terms of runtime and can save us a lot of work if your calculations about bit patterns are off by one ;-)
comment:149 in reply to: ↑ 148 Changed 7 years ago by
Replying to vbraun:
With the updated valgrind they are the only errors while running the testsuite(!).
Wow, that's excellent news (except for my code :-/
)!
Trivial to spot. Its possible that they are false positives if you use only certain bit positions, but zeroing out the memory doesn't cost much in terms of runtime and can save us a lot of work if your calculations about bit patterns are off by one ;-)
True. Anyway, to fix it I probably need answers to my questions from comment:147 in order to know exactly what to do.
comment:150 Changed 7 years ago by
In the first one you are reading past the end of the buffer from unallocated memory. Either you need to allocate enough (round up) or also handle the cases where you have less than a full 8 bytes. Even though its highly unlikely, reading past the end could even result in a segfault if the process space happens to end there.
comment:151 Changed 7 years ago by
If it's an "invalid read of size 8" and "0 bytes after a block" in mpn_copyi, chances are it's not a bug. GMP/MPIR will sometimes read 16 aligned bytes at a time when there are just 8 bytes left to copy, on certain architectures where this is safe. This might be the case in mpn_rshift too.
If you're absolutely sure that your code should be correct there, you could try compiling MPIR with assembly optimizations disabled and see if the error goes away. Or easier, temporarily replace the mpn_rshift in your code with a plain C implementation (e.g. the one in mpn/generic/rshift.c in MPIR).
The "Conditional jump or move depends on uninitialised value" is absolutely a bug.
comment:152 in reply to: ↑ 145 Changed 7 years ago by
Replying to vbraun:
I think they (the line numbers, SK) are (the same on different machines, SK), but I wouldn't bet anything on it ;-)
Hmm. The conditional jump occurs in line 2828 of the c-file, but this is:
/* "sage/misc/bounded_integer_sequences.pyx":398 * n += S1.itembitsize * mpz_clear(tmp) * return -1 # <<<<<<<<<<<<<< * * cdef int index_biseq(biseq_t S, int item, size_t start) except -2: */ #### Here is 2828 __pyx_r = -1; goto __pyx_L0;
Hm. Volker, could you provide the location of the conditional jump that is giving you the warning?
comment:153 Changed 7 years ago by
- Status changed from needs_review to needs_work
- Work issues set to Rework by using Sage's existing bitset code
I had a chat with Nathann today. He asked me about the data structure used by GMP, which is mainly long*
(aka "limb"), plus some information on the number of limbs used and the number of limbs allocated. long*
is also used by sage.misc.bitset
.
What I am implementing here is in fact quite similar to bitsets (one item of a bounded integer sequence corresponds to a chunk of bits, but otherwise it is the same). So, he suggested to not duplicate code, but make mutual use.
I have looked at the bitset code in the past, but apparently it was long before I implemented bounded integer sequences: Now I realize that much of what I do here is actually done in sage.misc.bitset
! So, definitely I should reuse the code. Hence, marking it as "needs work" for now, as the code will totally change.
Suggestion: I'll add to sage.misc.bitset
what is not already there, and then sage.misc.bounded_integer_sequence
will not provide all the boilerplate code, but only provide the Python layer to comfortably use bounded integer sequences.
comment:154 Changed 7 years ago by
I do need a bit more for biseq_t
than what is provided by bitset_t
. For example, I need to store the information how many bits are required to store one item on the list. But anyway, I will now try to rewrite my code.
comment:155 Changed 7 years ago by
- Commit changed from 6dfb1cb1c6dc5704e2a8918cf2aeabdd8dbef69a to 47c639d55d3c07670b8b52a629f7cf86a8beb7ce
comment:156 Changed 7 years ago by
- Status changed from needs_work to needs_review
- Work issues Rework by using Sage's existing bitset code deleted
By suggestion of Nathann, I am now reusing part of the existing code from sage.misc.bitset
. Main advantage is the fact that (in contrast to the mpz_*
functions) leading zeros will not be silently removed, which means that I could avoid many special cases.
Hopefully Nathann will find the new version of the code a lot less frightening than the old version...
Needs review again, and soonish I will update the timings that have occasionally been mentioned in comments above.
comment:157 Changed 7 years ago by
Let us compare the performance tests from comment:83 (also adding tests for pickling) with the new code version (I am testing on the same laptop, thus, hopefully a comparison makes sense):
sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence sage: L0 = [randint(0,7) for i in range(5)] sage: L1 = [randint(0,15) for i in range(15)] sage: L2 = [randint(0,31) for i in range(50)] sage: L3 = [randint(0,31) for i in range(5000)] sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3) sage: S0 = BoundedIntegerSequence(8, L0) sage: S1 = BoundedIntegerSequence(16, L1) sage: S2 = BoundedIntegerSequence(32, L2) sage: S3 = BoundedIntegerSequence(32, L3) sage: %timeit x = BoundedIntegerSequence(8, L0) 1000000 loops, best of 3: 1.15 µs per loop sage: %timeit x = BoundedIntegerSequence(16, L1) 1000000 loops, best of 3: 1.35 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L2) 1000000 loops, best of 3: 1.88 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L3) 10000 loops, best of 3: 70.5 µs per loop
--> Became all faster
sage: %timeit x = list(S0) 1000000 loops, best of 3: 1.43 µs per loop sage: %timeit x = list(S1) 100000 loops, best of 3: 1.84 µs per loop sage: %timeit x = list(S2) 100000 loops, best of 3: 4.41 µs per loop sage: %timeit x = list(S3) 1000 loops, best of 3: 304 µs per loop sage: %timeit x = S0.list() 1000000 loops, best of 3: 666 ns per loop sage: %timeit x = S1.list() 1000000 loops, best of 3: 1.2 µs per loop sage: %timeit x = S2.list() 100000 loops, best of 3: 2.77 µs per loop sage: %timeit x = S3.list() 10000 loops, best of 3: 112 µs per loop
--> Became faster at least for short sequences, which is the case I am mainly interested in
sage: timeit("x=S0[:-1]", number=100000) 100000 loops, best of 3: 1.49 µs per loop sage: timeit("x=S1[:-1]", number=100000) 100000 loops, best of 3: 1.52 µs per loop sage: timeit("x=S2[:-1]", number=100000) 100000 loops, best of 3: 1.52 µs per loop sage: timeit("x=S3[:-1]", number=100000) 100000 loops, best of 3: 2.29 µs per loop
--> Became all faster
sage: timeit("x=S2[:-1:2]", number=10000) 10000 loops, best of 3: 2.5 µs per loop sage: timeit("x=S3[:-1:2]", number=10000) 10000 loops, best of 3: 62.2 µs per loop
--> Became faster for the shorter sequence
sage: timeit("x=S0[1]", number=1000000) 1000000 loops, best of 3: 340 ns per loop sage: timeit("x=S0[4]", number=1000000) 1000000 loops, best of 3: 339 ns per loop sage: timeit("x=S3[1]", number=1000000) 1000000 loops, best of 3: 339 ns per loop sage: timeit("x=S3[4500]", number=1000000) 1000000 loops, best of 3: 345 ns per loop
--> Became all faster
sage: L0x = [randint(0,7) for i in range(5)] sage: L1x = [randint(0,15) for i in range(15)] sage: L2x = [randint(0,31) for i in range(50)] sage: L3x = [randint(0,31) for i in range(5000)] # verified that they differ from L0,L1,L2,L3 sage: S0x = BoundedIntegerSequence(8, L0x) sage: S1x = BoundedIntegerSequence(16, L1x) sage: S2x = BoundedIntegerSequence(32, L2x) sage: S3x = BoundedIntegerSequence(32, L3x) sage: S1y = BoundedIntegerSequence(16, L1) sage: S2y = BoundedIntegerSequence(32, L2) sage: S3y = BoundedIntegerSequence(32, L3) sage: S1y==S1!=S1x, S1y is not S1 (True, True) sage: S0z1 = S0+S0 sage: S0z2 = S0+S0x sage: S1z1 = S1+S1 sage: S1z2 = S1+S1x sage: S2z1 = S2+S2 sage: S2z2 = S2+S2x sage: S3z1 = S3+S3 sage: S3z2 = S3+S3x sage: timeit("S0z2.startswith(S0)", number=1000000) 1000000 loops, best of 3: 210 ns per loop sage: timeit("S0z2.startswith(S0x)", number=1000000) 1000000 loops, best of 3: 211 ns per loop sage: timeit("S0x in S0z2", number=1000000) 1000000 loops, best of 3: 368 ns per loop sage: timeit("S1z2.startswith(S1)", number=1000000) 1000000 loops, best of 3: 200 ns per loop sage: timeit("S1z2.startswith(S1x)", number=1000000) 1000000 loops, best of 3: 199 ns per loop sage: timeit("S1x in S1z2", number=1000000) 1000000 loops, best of 3: 542 ns per loop sage: timeit("S2z2.startswith(S2)", number=1000000) 1000000 loops, best of 3: 219 ns per loop sage: timeit("S2z2.startswith(S2x)", number=1000000) 1000000 loops, best of 3: 205 ns per loop sage: timeit("S2x in S2z2", number=1000000) 1000000 loops, best of 3: 2.33 µs per loop sage: timeit("S3z2.startswith(S3)", number=1000000) 1000000 loops, best of 3: 963 ns per loop sage: timeit("S3z2.startswith(S3x)", number=1000000) 1000000 loops, best of 3: 201 ns per loop sage: timeit("S3x in S3z2", number=1000) 1000 loops, best of 3: 2.36 ms per loop
--> All but the last became faster, which is excellent for my applications, as I am using subsequence containment tests fairly often.
Additional timings for pickling, comparing Python lists with bounded integer sequences:
sage: %timeit loads(dumps(L0)) 10000 loops, best of 3: 39.5 µs per loop sage: %timeit loads(dumps(L1)) 10000 loops, best of 3: 43.9 µs per loop sage: %timeit loads(dumps(L2)) 10000 loops, best of 3: 65.5 µs per loop sage: %timeit loads(dumps(L3)) 100 loops, best of 3: 2.13 ms per loop sage: %timeit loads(dumps(S0)) 10000 loops, best of 3: 68.7 µs per loop sage: %timeit loads(dumps(S1)) 10000 loops, best of 3: 72.1 µs per loop sage: %timeit loads(dumps(S2)) 10000 loops, best of 3: 87 µs per loop sage: %timeit loads(dumps(S3)) 1000 loops, best of 3: 1.06 ms per loop
--> Pickling of short bounded integer sequences has an overhead, but longer sequences are pickled faster, due to the more compact representation.
I think that one can say that the new code version results in an overall performance gain, and I suppose it is less frightening now.
comment:158 Changed 7 years ago by
- Commit changed from 47c639d55d3c07670b8b52a629f7cf86a8beb7ce to e3260e2d9f5bf0483770cc5235c00cd24c4eb310
Branch pushed to git repo; I updated commit sha1. New commits:
e3260e2 | Typographical improvements
|
comment:159 Changed 7 years ago by
- Commit changed from e3260e2d9f5bf0483770cc5235c00cd24c4eb310 to aac3a049805ab29aea8a0d3439c1eadf6605eab1
Branch pushed to git repo; I updated commit sha1. New commits:
aac3a04 | Fix cornercase in unpickling
|
comment:160 Changed 7 years ago by
- Commit changed from aac3a049805ab29aea8a0d3439c1eadf6605eab1 to b182ba27bdaeb0f066daf1b575db3588b3195f89
Branch pushed to git repo; I updated commit sha1. New commits:
b182ba2 | Use mpn_l/rshift only with nonzero shift
|
comment:161 follow-up: ↓ 164 Changed 6 years ago by
Just randomly looking through this patch...
I don't like to put this in src/sage/misc
. There is already too much stuff there. I think we need an additional top-level directory for basic data structures like this (bitset also should be in there). What do you think of src/sage/data_structures
?
comment:162 follow-up: ↓ 165 Changed 6 years ago by
Also, instead of allocate_biseq()
, I would use biseq_init()
to be more analogous to mpz_init()
and bitset_init()
. More generally, replace foo_biseq()
by biseq_foo()
.
comment:163 follow-up: ↓ 166 Changed 6 years ago by
Please don't cdef extern from "gmp.h"
, use from sage.libs.gmp.mpn cimport *
or something.
comment:164 in reply to: ↑ 161 ; follow-up: ↓ 169 Changed 6 years ago by
Replying to jdemeyer:
Just randomly looking through this patch...
I don't like to put this in
src/sage/misc
. There is already too much stuff there. I think we need an additional top-level directory for basic data structures like this (bitset also should be in there). What do you think ofsrc/sage/data_structures
?
I think this question has been discussed on another ticket that I authored. My impression from the discussion was that "structural" stuff should either go to sage/structure
or sage/misc
. It should be the former, if it is about structures that only make sense in Sage (Parents, elements, ...), whereas sage/misc
is for structure that would also make sense without Sage. I thought that bounded integer sequences belong to the latter, and I would also think that TripleDict
and MonoDict
should better be in sage/misc
(the only reason for them being in sage/structure
is the fact that they were introduced for coercion, which is Sage specific).
So, I think sage/misc
is a good place. But I wouldn't mind to introduce a third structural module, sage/data_structures
. On a different ticket?
comment:165 in reply to: ↑ 162 Changed 6 years ago by
Replying to jdemeyer:
Also, instead of
allocate_biseq()
, I would usebiseq_init()
to be more analogous tompz_init()
andbitset_init()
. More generally, replacefoo_biseq()
bybiseq_foo()
.
Ok, makes sense.
comment:166 in reply to: ↑ 163 ; follow-up: ↓ 167 Changed 6 years ago by
Replying to jdemeyer:
Please don't
cdef extern from "gmp.h"
, usefrom sage.libs.gmp.mpn cimport *
or something.
Is it available from there? If it is, then I should certainly do it in that way.
comment:167 in reply to: ↑ 166 ; follow-up: ↓ 170 Changed 6 years ago by
Replying to SimonKing:
Replying to jdemeyer:
Please don't
cdef extern from "gmp.h"
, usefrom sage.libs.gmp.mpn cimport *
or something.Is it available from there? If it is, then I should certainly do it in that way.
mp_bits_per_limb
is not available from there, hence, I still import it from gmp.h.
For now I will not introduce a new sage.data_structures
.
After running tests, I will push a commit that address your other comments.
comment:168 Changed 6 years ago by
- Commit changed from b182ba27bdaeb0f066daf1b575db3588b3195f89 to 1f5a53e7b23e406c553d789f83604cd4a7f90655
Branch pushed to git repo; I updated commit sha1. New commits:
1f5a53e | Change the naming schemes of functions according to conventions in gmp and bitset
|
comment:169 in reply to: ↑ 164 ; follow-ups: ↓ 176 ↓ 177 Changed 6 years ago by
Replying to SimonKing:
On a different ticket?
It makes no sense to first introduce a new module on this ticket and then make a new ticket to immediately move that new module somewhere else (requiring changing import statements and everything). If it were an existing module, things would be different.
The problem with src/sage/misc
is that it contains a lot of various things. Usually "misc" is used when something doesn't belong to any other category. But it's clear that there are a lot of modules defining data structures in misc
(but also other stuff like temporary_file
). I think it makes sense to group these in src/sage/data_structures
(or another name, whatever) instead of dumping everything in misc
.
comment:170 in reply to: ↑ 167 Changed 6 years ago by
Replying to SimonKing:
mp_bits_per_limb
is not available from there, hence, I still import it from gmp.h.
No, do cimport it from sage.libs.gmp.types
and add the declaration there.
comment:171 Changed 6 years ago by
This documentation of ctypedef struct biseq
should be expanded. When reading it, it is really not clear to me what the various fields mean (especially "bla&mask_item" greps onethe item bla starts with
makes no sense) and how the data structure works. If you're trying too hard to get everything one line, use
ctypedef struct biseq: # comment about data bitset_t data # comment about itembitsize # comment about itembitsize # comment about itembitsize mp_bitcnt_t itembitsize ...
comment:172 Changed 6 years ago by
- Status changed from needs_review to needs_work
comment:173 Changed 6 years ago by
first_bits_equal
-> biseq_first_bits_equal
comment:174 follow-up: ↓ 178 Changed 6 years ago by
Also: I think you use int
in a lot of places where you really want a long
or mp_limb_t
or mp_size_t
or Py_ssize_t
.
comment:175 Changed 6 years ago by
list_to_biseq
seems to leak memory if the biseq_t
was already initialized. I don't mind, but it should be documented, since it's not obvious without reading the code.
Also in list_to_biseq
, the branch if ln>mp_bits_per_limb:
leaks tmp
.
comment:176 in reply to: ↑ 169 ; follow-up: ↓ 179 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
On a different ticket?
It makes no sense to first introduce a new module on this ticket and then make a new ticket to immediately move that new module somewhere else
Except that the new ticket is not just about moving sage.misc.bounded_integer_sequences
to sage.data_structures.bounded_integer_sequences
, but moves a whole lot more.
On THIS ticket, it does not make sense to dooe the big move from sage.misc
to sage.data_structures
, since this ticket is about introducing bounded integer sequences
comment:177 in reply to: ↑ 169 Changed 6 years ago by
Replying to jdemeyer:
The problem with
src/sage/misc
is that it contains a lot of various things. Usually "misc" is used when something doesn't belong to any other category.
Yeah, and most of the time this is stuff that does not really belong to Sage in the first place, and should eventually be moved to upstream libraries. In some sense, it can be seen as a feature that it looks wrong there: it gives an incent for developpers to finally take on the task of actually moving the stuff upstream.
comment:178 in reply to: ↑ 174 Changed 6 years ago by
Replying to jdemeyer:
Also: I think you use
int
in a lot of places where you really want along
ormp_limb_t
ormp_size_t
orPy_ssize_t
.
I tried to be consistent. However, I do not find documentation: What is the purpose of mp_size_t
versus mp_bitcnt_t
?
comment:179 in reply to: ↑ 176 ; follow-up: ↓ 181 Changed 6 years ago by
Replying to SimonKing:
On THIS ticket, it does not make sense to do the big move from
sage.misc
tosage.data_structures
Of course not. But I meant to add the new module bounded_integer_sequences
to sage.data_structures
(the result being that this would be the only module inside sage.data_structures
). Putting it in the correct place immediately is better than first putting it somewhere else and then moving it.
comment:180 Changed 6 years ago by
Strangely, in gmp.h, I see that mp_size_t
is long long int
, hence, it is a signed type, in contrast to Python's size_t
.
comment:181 in reply to: ↑ 179 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
On THIS ticket, it does not make sense to do the big move from
sage.misc
tosage.data_structures
Of course not. But I meant to add the new module
bounded_integer_sequences
tosage.data_structures
(the result being that this would be the only module insidesage.data_structures
).
OK. This I could do.
comment:182 follow-up: ↓ 184 Changed 6 years ago by
Replying to jdemeyer:
I tried to be consistent.
The problem is that mp_bits_per_limb
is about the number of bits in mp_limb_t
, not the number of bits in int
. And using int
for indices is inconsistent with Python which uses Py_ssize_t
(which should be the same as mp_size_t
).
comment:183 follow-up: ↓ 185 Changed 6 years ago by
I would say: use mp_size_t
for indices, mp_limb_t
for the actual integers inside your sequences.
comment:184 in reply to: ↑ 182 ; follow-up: ↓ 186 Changed 6 years ago by
Replying to jdemeyer:
Replying to jdemeyer:
I tried to be consistent.
The problem is that
mp_bits_per_limb
is about the number of bits inmp_limb_t
, not the number of bits inint
. And usingint
for indices is inconsistent with Python which usesPy_ssize_t
(which should be the same asmp_size_t
).
OK. So, I should use mp_size_t
for indices. But what should I use mp_bitcnt_t
for?
comment:185 in reply to: ↑ 183 ; follow-up: ↓ 187 Changed 6 years ago by
Replying to jdemeyer:
I would say: use
mp_size_t
for indices,mp_limb_t
for the actual integers inside your sequences.
If I am not mistaken, I always use mp_limb_t
for the limbs. If you find a spot where this is not the case, please tell me.
comment:186 in reply to: ↑ 184 ; follow-up: ↓ 189 Changed 6 years ago by
Replying to SimonKing:
But what should I use
mp_bitcnt_t
for?
Any time you need a count of bits (which is always positive!)...
comment:187 in reply to: ↑ 185 ; follow-up: ↓ 190 Changed 6 years ago by
Replying to SimonKing:
If you find a spot where this is not the case, please tell me.
cdef int biseq_getitem(biseq_t S, mp_size_t index) except -1 ^^^
comment:188 Changed 6 years ago by
I would say: when in doubt between mp_size_t
and mp_bitcnt_t
, use mp_size_t
.
When in doubt between int
and anything else: use the anything else.
comment:189 in reply to: ↑ 186 ; follow-up: ↓ 193 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
But what should I use
mp_bitcnt_t
for?Any time you need a count of bits (which is always positive!)...
You mean, in particular, that I should *not* use it for an index? Hence, if I want to refer to bit number b
in limb number l
, then both b
and l
should be mp_size_t
? Currently, I try to consistently use mp_bitcnt_t
for b
and mp_size_t
for l
.
comment:190 in reply to: ↑ 187 ; follow-ups: ↓ 192 ↓ 194 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
If you find a spot where this is not the case, please tell me.
cdef int biseq_getitem(biseq_t S, mp_size_t index) except -1 ^^^
Rationale: This function does not return a limb, but only one item (and a limb may comprise many items). To keep them apart, I chose a different type. Moreover, I consider mp_limb_t
as a type that belongs to gmp. But the return type of that function should belong to the "Python world".
comment:191 Changed 6 years ago by
- Cc mguaypaq removed
comment:192 in reply to: ↑ 190 Changed 6 years ago by
Replying to SimonKing:
I consider
mp_limb_t
as a type that belongs to gmp. But the return type of that function should belong to the "Python world".
int
doesn't belong to the Python world either (don't confuse Cython int
with Python int
!)
comment:193 in reply to: ↑ 189 Changed 6 years ago by
Replying to SimonKing:
Replying to jdemeyer:
Replying to SimonKing:
But what should I use
mp_bitcnt_t
for?Any time you need a count of bits (which is always positive!)...
You mean, in particular, that I should *not* use it for an index? Hence, if I want to refer to bit number
b
in limb numberl
, then bothb
andl
should bemp_size_t
? Currently, I try to consistently usemp_bitcnt_t
forb
andmp_size_t
forl
.
Using mp_bitcnt_t
for this is fine, when I said "index" I meant index like for Python __getitem__
comment:194 in reply to: ↑ 190 Changed 6 years ago by
Replying to SimonKing:
To keep them apart, I chose a different type.
If that's the main motivation, define a new type:
ctypedef mp_limb_t biseq_item_t
comment:195 Changed 6 years ago by
Also, if bound
is of type mp_limb_t
, then the following check is not needed at all:
mpz_init_set_ui(tmp, bound-1) ln = mpz_sizeinbase(tmp, 2) if ln>mp_bits_per_limb: raise ValueError("The integer bound {} does not fit into {}".format(bound, mp_bits_per_limb))
comment:196 follow-up: ↓ 197 Changed 6 years ago by
One more reason for using "int" as return type: If I use mp_limb_t
, then 6
is printed as 6L
.
comment:197 in reply to: ↑ 196 ; follow-up: ↓ 199 Changed 6 years ago by
Replying to SimonKing:
One more reason for using "int" as return type: If I use
mp_limb_t
, then6
is printed as6L
.
That's not one more reason to use int
, it's one less reason to use mp_limb_t
. With mp_limb_signed_t
(which isn't declared but should be in src/sage/libs/gmp/types.pxd
), you shouldn't have this problem.
comment:198 Changed 6 years ago by
But with a better specification of the biseq
struct (see 171) I might give a better recommendation about which types to use.
comment:199 in reply to: ↑ 197 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
One more reason for using "int" as return type: If I use
mp_limb_t
, then6
is printed as6L
.That's not one more reason to use
int
, it's one less reason to usemp_limb_t
. Withmp_limb_signed_t
(which isn't declared but should be insrc/sage/libs/gmp/types.pxd
), you shouldn't have this problem.
There may be a problem with either int
or mp_limb_signed_t
: If the bound for the integer sequence is very large, then a single item would fill a whole mp_limb_t
. However, this would not fit into a signed int/limb.
So, at the moment, it seems to me that ctypedef mp_limb_t biseq_item_t
is preferred. Since the __repr__
method is not time-critical, I suggest to do the transformation from mp_limb_t
to a nicely printed type there. I would accept that __getitem__
returns something that is not so nicely printed.
comment:200 Changed 6 years ago by
- Commit changed from 1f5a53e7b23e406c553d789f83604cd4a7f90655 to ef69d1e6eb30a9551bb45d1e91f6180454d78b86
Branch pushed to git repo; I updated commit sha1. New commits:
ef69d1e | Create sage.data_structures, move biseq into it, and amend types in biseq
|
comment:201 Changed 6 years ago by
- Commit changed from ef69d1e6eb30a9551bb45d1e91f6180454d78b86 to 83be138062f619a237e86650807c44b38eec94e5
Branch pushed to git repo; I updated commit sha1. New commits:
83be138 | Reword documentation of biseq_s
|
comment:202 Changed 6 years ago by
- Status changed from needs_work to needs_review
I hope my preceding comments do address your concerns, Jeroen. One more thing: Bitset has cdef struct bitset_s
and ctypedef bitset_s bitset_t[1]
. To be consistent with it, I changed biseq
into biseq_s
, so that I now have cdef struct biseq_s
and ctypedef biseq_s biseq_t[1]
.
comment:203 Changed 6 years ago by
In the "documentation" of biseq_s
, this sentence is still way too vague to be useful: Use bitsets to store the data, which in turn is based on GMP integers
. Like: what is "the data" and how is it stored in the bitset?
And it's not really true that bitset is based on GMP integers. It's just an array of limbs but we use GMP/MPIR mpn_
functions for some manipulations.
comment:204 Changed 6 years ago by
- Status changed from needs_review to needs_work
comment:205 Changed 6 years ago by
Are you using anything from NTL? If not, why the
include "sage/libs/ntl/decl.pxi"
comment:206 Changed 6 years ago by
- Commit changed from 83be138062f619a237e86650807c44b38eec94e5 to 1fb819cb133e2b94bfc9222585b8762ee5efad21
Branch pushed to git repo; I updated commit sha1. New commits:
1fb819c | Remove a needless cimport, clarify documentation of biseq_s
|
comment:207 follow-up: ↓ 208 Changed 6 years ago by
- Status changed from needs_work to needs_review
The import from ntl has indeed been useless.
Is the description of biseq_s
clear enough now?
comment:208 in reply to: ↑ 207 ; follow-up: ↓ 210 Changed 6 years ago by
Replying to SimonKing:
Is the description of
biseq_s
clear enough now?
Add that the items in the sequence are non-negative integers and that itembitsize
is in [1 .. GMP_LIMB_BITS]
(assuming both these statements are true).
And I guess that mask_item
equals limb_lower_bits_up(itembitsize)
, where limb_lower_bits_up
is the function defined in src/sage/misc/bitset.pxi
.
comment:209 Changed 6 years ago by
- Commit changed from 1fb819cb133e2b94bfc9222585b8762ee5efad21 to e166d38d8be72453d42325c5f6a73592ddeb3827
Branch pushed to git repo; I updated commit sha1. New commits:
e166d38 | Further clarification of biseq_s doc.
|
comment:210 in reply to: ↑ 208 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
Is the description of
biseq_s
clear enough now?Add that the items in the sequence are non-negative integers and that
itembitsize
is in[1 .. GMP_LIMB_BITS]
(assuming both these statements are true).
Done.
And I guess that
mask_item
equalslimb_lower_bits_up(itembitsize)
, wherelimb_lower_bits_up
is the function defined insrc/sage/misc/bitset.pxi
.
It could be that you just spotted a bug (corner case). I define it as ((<mp_limb_t>1)<<itembitsize)-1
, which is limb_lower_bits_down
. And that's to say: If the itembitsize is mp_bits_per_limb (which should be allowed), then the mask is 0, but it should be all bits 1.
Trying to construct a test to see if it really is a bug...
New commits:
e166d38 | Further clarification of biseq_s doc.
|
comment:211 Changed 6 years ago by
Yes, it is a bug:
sage: B = BoundedIntegerSequence(2^32-1, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10]) sage: B <0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
Thank you for spotting it!
comment:212 follow-up: ↓ 216 Changed 6 years ago by
Hang on, I'm working on a reviewer patch.
comment:213 Changed 6 years ago by
Question: what is the reason that you allow a length of 0
, requiring lots of special cases in the code?
I.e. why not replace
if S.length: if R.length: bitset_realloc(R.data, S.data.size) else: bitset_init(R.data, S.data.size) mpn_copyi(R.data.bits, S.data.bits, S.data.limbs) elif R.length: bitset_free(R.data)
by
bitset_realloc(R.data, S.data.size) mpn_copyi(R.data.bits, S.data.bits, S.data.limbs)
comment:214 follow-up: ↓ 218 Changed 6 years ago by
I see, bitset doesn't allow a length of 0. Perhaps this (artificial?) limitation on bitsets should be removed first...
comment:215 Changed 6 years ago by
- Commit changed from e166d38d8be72453d42325c5f6a73592ddeb3827 to 3d293b96fa5a584684efbaae8fe667b885618797
Branch pushed to git repo; I updated commit sha1. New commits:
3d293b9 | Fixing a corner case
|
comment:216 in reply to: ↑ 212 ; follow-up: ↓ 219 Changed 6 years ago by
comment:217 follow-up: ↓ 220 Changed 6 years ago by
...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.
comment:218 in reply to: ↑ 214 Changed 6 years ago by
Replying to jdemeyer:
I see, bitset doesn't allow a length of 0.
Exactly. I do need bounded integer sequences of length zero, but then I can not initialise the corresponding bitset (since it is required to have positive length). Therefore the special cases.
comment:219 in reply to: ↑ 216 Changed 6 years ago by
- Status changed from needs_review to needs_work
Replying to SimonKing:
Sorry, I pushed already.
No problem, it's just a small fix. Can you leave the branch alone for a while now?
comment:220 in reply to: ↑ 217 ; follow-up: ↓ 222 Changed 6 years ago by
Replying to jdemeyer:
...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.
I am not sure if I want this, since in some cases (at least in my applications) I ask self.data.size
for the bitlength of the sequence, and thus I would need other special cases there.
comment:221 Changed 6 years ago by
OTOH, I just see that in theses cases (e.g., concatenation of biseq) I already have special cases for length 0. So, it would not be a loss to re-organise the special casing...
comment:222 in reply to: ↑ 220 Changed 6 years ago by
Replying to SimonKing:
Replying to jdemeyer:
...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.
I am not sure if I want this, since in some cases (at least in my applications) I ask
self.data.size
for the bitlength of the sequence, and thus I would need other special cases there.
I would prefer not to assume that data.size == length * itembitsize
. For example, it might turn out to be more efficient to store some extra bits (1 extra limb for example).
comment:223 Changed 6 years ago by
I will rename list_to_biseq
-> biseq_init_list
if you don't mind.
comment:224 Changed 6 years ago by
I would also prefer to remove biseq_copy
since it's a dangerous function: it assumes that the biseq
has been allocated but it throws away the current contents.
Better to split this operation in two: changing biseq_copy
to biseq_init_copy
assuming it was not allocated. If you really want something like biseq_copy
, just use biseq_dealloc
and biseq_init_copy
.
comment:225 Changed 6 years ago by
- Dependencies set to #17195
comment:226 follow-up: ↓ 228 Changed 6 years ago by
Why should there be a dependency on cython upgrade?
comment:227 Changed 6 years ago by
- Dependencies changed from #17195 to #17195, #17196
comment:228 in reply to: ↑ 226 Changed 6 years ago by
Replying to SimonKing:
Why should there be a dependency on cython upgrade?
Only the newest Cython has declarations for the PySlice
API, so you can use PySlice_GetIndicesEx()
easily.
comment:229 follow-up: ↓ 230 Changed 6 years ago by
- Dependencies changed from #17195, #17196 to #17195
As much as I know, I don't use the bitset functions for which there is a requirement of equality of size. Hence, I don't think #17196 is a dependency.
comment:230 in reply to: ↑ 229 ; follow-up: ↓ 231 Changed 6 years ago by
Replying to SimonKing:
As much as I know, I don't use the bitset functions for which there is a requirement of equality of size.
Yes, you don't use them but you should use those functions. It will simplify a lot of things.
comment:231 in reply to: ↑ 230 ; follow-up: ↓ 232 Changed 6 years ago by
Replying to jdemeyer:
Yes, you don't use them but you should use those functions. It will simplify a lot of things.
For example, biseq_concat
is broken in the case where (S1.data.size)%mp_bits_per_limb + (S2.data.size)%mp_bits_per_limb > mp_bits_per_limb
.
comment:232 in reply to: ↑ 231 Changed 6 years ago by
Replying to jdemeyer:
Replying to jdemeyer:
Yes, you don't use them but you should use those functions. It will simplify a lot of things.
For example,
biseq_concat
is broken in the case where(S1.data.size)%mp_bits_per_limb + (S2.data.size)%mp_bits_per_limb > mp_bits_per_limb
.
Really? OK, I'll try to find an example that exposes the problem.
comment:233 Changed 6 years ago by
sage: B = BoundedIntegerSequence(2^30-1, [1]) sage: B <1> sage: B+B <1, 1>
Where's the problem?
comment:234 Changed 6 years ago by
Could you try again with a larger value of 1
?
comment:235 Changed 6 years ago by
Here it is:
sage: B = BoundedIntegerSequence(2^30-1, [2^30-1]) sage: B <1073741823> sage: B+B <1073741823, 3>
OK, this requires a fix.
comment:236 follow-up: ↓ 237 Changed 6 years ago by
Do I understand correctly: At #17196, shift and comparison operations should be introduced that behave well wrt bitsizes, and then these operations should replace the mpn_... that I am using here.
comment:237 in reply to: ↑ 236 Changed 6 years ago by
- Dependencies changed from #17195 to #17195, #17196
comment:238 Changed 6 years ago by
- Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
- Modified changed from 10/22/14 14:14:46 to 10/22/14 14:14:46
comment:239 Changed 6 years ago by
- Commit changed from 3d293b96fa5a584684efbaae8fe667b885618797 to 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8
comment:240 Changed 6 years ago by
Last commit is totally work in progress.
comment:241 Changed 6 years ago by
- Commit changed from 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8 to ab2f0c2bf3b68e4e979650f5554d529e5b7ea625
Branch pushed to git repo; I updated commit sha1. New commits:
ab2f0c2 | Various reviewer fixes
|
comment:242 Changed 6 years ago by
New commits:
ab2f0c2 | Various reviewer fixes
|
comment:243 Changed 6 years ago by
- Commit changed from ab2f0c2bf3b68e4e979650f5554d529e5b7ea625 to 815a40d9a2939ebe6e3e127847c04a8c5d7eaf9a
comment:244 follow-up: ↓ 245 Changed 6 years ago by
What exactly does cython.overflowcheck do? Check whether the given arguments fit into the prescribed data types without truncation?
comment:245 in reply to: ↑ 244 ; follow-up: ↓ 249 Changed 6 years ago by
Replying to SimonKing:
What exactly does cython.overflowcheck do?
See http://docs.cython.org/src/reference/compilation.html: for this application, it checks that the multiplication totalbitsize = l * itemsize
doesn't overflow.
comment:246 Changed 6 years ago by
- Commit changed from 815a40d9a2939ebe6e3e127847c04a8c5d7eaf9a to 8d16a61d2874a38e10233cdf434db3854bd5ee0d
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
8d16a61 | Lots of fixes for bounded integer sequences
|
comment:247 Changed 6 years ago by
- Commit changed from 8d16a61d2874a38e10233cdf434db3854bd5ee0d to 8d59d141b7090fb6e5f6529b968e224ec0e5f73b
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
8d59d14 | Lots of fixes for bounded integer sequences
|
comment:248 Changed 6 years ago by
comment:249 in reply to: ↑ 245 ; follow-up: ↓ 251 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
What exactly does cython.overflowcheck do?
See http://docs.cython.org/src/reference/compilation.html: for this application, it checks that the multiplication
totalbitsize = l * itemsize
doesn't overflow.
Aha! So, it does not just check an overflow of the arguments, but it checks overflows happening in the innards of the function. Cool!
comment:250 Changed 6 years ago by
- Commit changed from 8d59d141b7090fb6e5f6529b968e224ec0e5f73b to d22c1ca23727402e376c7f772c2f51b8ed6af1f1
comment:251 in reply to: ↑ 249 Changed 6 years ago by
Replying to SimonKing:
So, it does not just check an overflow of the arguments
In fact, it does not check overflow of the arguments.
Overflow checking of the arguments should happen in calling the function, not in the function itself.
Also note that Cython always does overflow checking when converting from Python types to C types, but not by default between C types.
comment:252 follow-up: ↓ 253 Changed 6 years ago by
Is sagemath.org down? I can not build your branch, since downloading the new cython package fails.
New commits:
23762a3 | Import data_structures into global namespace
|
4aafd2f | Merge branch 'ticket/17196' into HEAD
|
d22c1ca | Lots of fixes for bounded integer sequences
|
New commits:
23762a3 | Import data_structures into global namespace
|
4aafd2f | Merge branch 'ticket/17196' into HEAD
|
d22c1ca | Lots of fixes for bounded integer sequences
|
comment:253 in reply to: ↑ 252 Changed 6 years ago by
comment:254 follow-up: ↓ 256 Changed 6 years ago by
My intention for this ticket is to implement an additional function biseq_setitem()
to set one item and then implement the biseq
functions using either biseq_getitem()/biseq_setitem()
or bitset
functions. In cases where this cannot be done, add the required primitives and use those. The end result will hopefully be a lot less code without loss of efficiency.
comment:255 follow-up: ↓ 258 Changed 6 years ago by
In case that you missed it, I am copying this question I asked before:
In biseq_init_list()
, the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?
comment:256 in reply to: ↑ 254 ; follow-up: ↓ 257 Changed 6 years ago by
Replying to jdemeyer:
My intention for this ticket is to implement an additional function
biseq_setitem()
to set one item and then implement thebiseq
functions using eitherbiseq_getitem()/biseq_setitem()
orbitset
functions. In cases where this cannot be done, add the required primitives and use those. The end result will hopefully be a lot less code without loss of efficiency.
Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?
comment:257 in reply to: ↑ 256 ; follow-up: ↓ 260 Changed 6 years ago by
Replying to SimonKing:
Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?
I don't quite understand what you refer to. Which function do you think would become slower?
comment:258 in reply to: ↑ 255 ; follow-up: ↓ 259 Changed 6 years ago by
Replying to jdemeyer:
In case that you missed it, I am copying this question I asked before:
In
biseq_init_list()
, the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?
What do you mean by "randomly truncated"? The given integers are replaced by their remainder modulo the "mangled" bound (i.e., the smallest power of two that is greater or equal the given bound). It is documented, and it isn't random.
comment:259 in reply to: ↑ 258 ; follow-up: ↓ 261 Changed 6 years ago by
Replying to SimonKing:
Replying to jdemeyer:
In case that you missed it, I am copying this question I asked before:
In
biseq_init_list()
, the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?What do you mean by "randomly truncated"? The given integers are replaced by their remainder modulo the "mangled" bound (i.e., the smallest power of two that is greater or equal the given bound). It is documented, and it isn't random.
OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?
comment:260 in reply to: ↑ 257 ; follow-up: ↓ 270 Changed 6 years ago by
Replying to jdemeyer:
Replying to SimonKing:
Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?
I don't quite understand what you refer to. Which function do you think would become slower?
BoundedIntegerSequence.__iter__
and biseq_to_list
would probably become
slower. Currently, one limb is shifted with mpn_rshift, and then all items in
this limb are extracted by >>
in this single limb, before mpn_rshifting the
next limb. If I understand correctly, you are planning to replace this by
biseq_getitem
, which means that you mpn_rshift a limb and extract the first
item of it, then mpn_rshift the same limb again for the second item.
I suppose repeated use of mpn_rshift is slower than a single mpn_rshift
followed by repeated single-limb->>
.
comment:261 in reply to: ↑ 259 ; follow-up: ↓ 271 Changed 6 years ago by
Replying to jdemeyer:
OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?
You mean one should just raise an error? OK, that's another possibility.
comment:262 Changed 6 years ago by
Why did you introduce
assert S1.itembitsize == S2.itembitsize
?
I think that generally it is not the job of boilerplate functions to do bound checking. If you want to do bound checking, then do so on the level of BoundedIntegerSequence
, but not on the level of biseq_t
.
comment:263 follow-up: ↓ 268 Changed 6 years ago by
Perhaps we should consider to change the hash function for bitsets (see the comments for the hash of bounded integer sequences that you removed). But I suppose this should not be done on this ticket.
My summary:
- We have different functions that assume that the given sequences have equivalent bounds. In most cases, the assumption is not checked, with exception of the one assert that you introduced. I vote for removing the assert.
- I don't think iteration should use
biseq_getitem
to extract items one by one. Instead, the current idea ("shift enough data so that one limb is full and extract items from this single limb") should be preserved. - I like your change to
biseq_getitem
, where you avoid usingmpn_rshift
for just one/two limbs. I think the same change could be done inbiseq_slice
in the casestep!=1
, and also in iteration.
Other than that, your changes seem fine to me. I suggest to do the "replacement of one/two limb shifts", unless you beat me to it.
comment:264 Changed 6 years ago by
- Reviewers set to Simon King
comment:265 Changed 6 years ago by
Concerning the assert: The only place in which biseq_init_concat
is used does in fact check compatibility of bounds, and raises a ValueError
if they are not compatible. Hence, the assert statement is redundant.
comment:266 Changed 6 years ago by
- Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820
comment:267 Changed 6 years ago by
- Commit changed from d22c1ca23727402e376c7f772c2f51b8ed6af1f1 to 41c40cf2c59734e7e665ba9c2350da2930c9cc8e
- Status changed from needs_work to needs_review
From my perspective, it is good to go...
New commits:
41c40cf | Don't use mpn_rshift for shifting two limbs
|
comment:268 in reply to: ↑ 263 Changed 6 years ago by
Replying to SimonKing:
- We have different functions that assume that the given sequences have equivalent bounds. In most cases, the assumption is not checked, with exception of the one assert that you introduced. I vote for removing the assert.
OK then, no problem.
comment:269 follow-up: ↓ 275 Changed 6 years ago by
Let's do some benchmarks:
- With the latest commit:
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: B = BoundedIntegerSequence(2, [1]*100000) sage: timeit('B.list()') 625 loops, best of 3: 965 µs per loop sage: timeit('list(B)') 125 loops, best of 3: 1.73 ms per loop sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000) sage: timeit('B.list()') 625 loops, best of 3: 1.1 ms per loop sage: timeit('list(B)') 125 loops, best of 3: 1.91 ms per loop
- With 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8:
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: B = BoundedIntegerSequence(2, [1]*100000) sage: timeit('B.list()') 125 loops, best of 3: 3 ms per loop sage: timeit('list(B)') 125 loops, best of 3: 3.87 ms per loop sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000) sage: timeit('B.list()') 125 loops, best of 3: 3.68 ms per loop sage: timeit('list(B)') 125 loops, best of 3: 4.47 ms per loop
comment:270 in reply to: ↑ 260 Changed 6 years ago by
Replying to SimonKing:
BoundedIntegerSequence.__iter__
andbiseq_to_list
would probably become slower. Currently, one limb is shifted with mpn_rshift, and then all items in this limb are extracted by>>
in this single limb, before mpn_rshifting the next limb. If I understand correctly, you are planning to replace this bybiseq_getitem
, which means that you mpn_rshift a limb and extract the first item of it, then mpn_rshift the same limb again for the second item.I suppose repeated use of mpn_rshift is slower than a single mpn_rshift followed by repeated single-limb-
>>
.
As you can see, I implemented biseq_getitem()
much faster than that, so I doubt there is still a slowdown for using repeated calls to biseq_getitem()
.
comment:271 in reply to: ↑ 261 Changed 6 years ago by
comment:272 follow-up: ↓ 273 Changed 6 years ago by
I just notice that you removed biseq_to_list
. Why? Is there no speed penalty for using iterator instead?
comment:273 in reply to: ↑ 272 Changed 6 years ago by
Replying to SimonKing:
I just notice that you removed
biseq_to_list
.
Just use [biseq_getitem_py(self.data, i) for i in range(self.data.length)]
comment:274 Changed 6 years ago by
Also, I think that biseq_to_list
is something which belongs in the Python interface, not the C interface.
comment:275 in reply to: ↑ 269 Changed 6 years ago by
I only see this comment after posting my previous comment.
Replying to jdemeyer:
Let's do some benchmarks:
- With the latest commit:
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: B = BoundedIntegerSequence(2, [1]*100000) sage: timeit('B.list()') 625 loops, best of 3: 965 µs per loop sage: timeit('list(B)') 125 loops, best of 3: 1.73 ms per loop sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000) sage: timeit('B.list()') 625 loops, best of 3: 1.1 ms per loop sage: timeit('list(B)') 125 loops, best of 3: 1.91 ms per loop
- With 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8:
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: B = BoundedIntegerSequence(2, [1]*100000) sage: timeit('B.list()') 125 loops, best of 3: 3 ms per loop sage: timeit('list(B)') 125 loops, best of 3: 3.87 ms per loop sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000) sage: timeit('B.list()') 125 loops, best of 3: 3.68 ms per loop sage: timeit('list(B)') 125 loops, best of 3: 4.47 ms per loop
OK, excellent, then I think it is ok to drop biseq_to_list.
comment:276 Changed 6 years ago by
If I understand correctly, the only remaining issue is whether we silently truncate the input, or raise an error if the given items are too large. I have no preference here, so, you may make it a reviewer patch.
comment:277 follow-up: ↓ 278 Changed 6 years ago by
Sigh. After merging this into #16453, I get several segmentation faults.
comment:278 in reply to: ↑ 277 Changed 6 years ago by
comment:279 follow-up: ↓ 282 Changed 6 years ago by
I am going to leave this ticket alone for at least a few days.
I do think there is still a lot of improvement, mainly simplification:
- Implement
biseq_setitem()
and use that in a lot of places. - Implement
mpn_equal_bits_shifted()
to compare bit patterns efficiently for equality and use this forbiseq_max_overlap
andbiseq_contains
.
comment:280 Changed 6 years ago by
- Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
- Modified changed from 10/23/14 14:20:11 to 10/23/14 14:20:11
comment:281 Changed 6 years ago by
- Commit changed from 41c40cf2c59734e7e665ba9c2350da2930c9cc8e to 9056cf636b65d30c5c89bc151c7740c1ea1143d3
New commits:
9056cf6 | Fix handling of bound in biseq_init_list
|
comment:282 in reply to: ↑ 279 Changed 6 years ago by
Replying to jdemeyer:
- Implement
biseq_setitem()
and use that in a lot of places.
BoundedIntegerSequence
was (originally) thought of as being immutable (it is hashable). Hence, so far, I only see application in biseq_init_list
and biseq_slice
. OK, that's something...
- Implement
mpn_equal_bits_shifted()
to compare bit patterns efficiently for equality and use this forbiseq_max_overlap
andbiseq_contains
.
+1.
comment:283 follow-up: ↓ 284 Changed 6 years ago by
Would it be acceptable that biseq_setitem(S,index,item)
assumes that S[item]
is zero before the assignment? This condition holds in the two applications I can think of.
Of course, I could additionally provide a function that sets a previously existing item to zero.
comment:284 in reply to: ↑ 283 Changed 6 years ago by
Replying to SimonKing:
Would it be acceptable that
biseq_setitem(S,index,item)
assumes thatS[item]
is zero before the assignment?
Then I would call it biseq_inititem(S, index, item)
.
comment:285 Changed 6 years ago by
- Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820
comment:286 follow-up: ↓ 289 Changed 6 years ago by
- Commit changed from 9056cf636b65d30c5c89bc151c7740c1ea1143d3 to e9c779ae629dbec356bb8546e3974e2a67050051
It will be more efficient to declare biseq_inititem
as cdef inline void
.
New commits:
e9c779a | Simplify code by using biseq_getitem/biseq_inititem
|
comment:287 Changed 6 years ago by
Concerning enumerate()
: I have no idea if Cython optimizes this. If not, you better just manually keep track of index
.
comment:288 follow-up: ↓ 290 Changed 6 years ago by
Why did you change if item_limb > bound
to if item > bound
? The former is a C comparison, therefore faster.
comment:289 in reply to: ↑ 286 Changed 6 years ago by
Replying to jdemeyer:
It will be more efficient to declare
biseq_inititem
ascdef inline void
.
More generally: if you have cdef
or cpdef
functions returning nothing and you don't require exception handling, then using void
as return type is more efficient (without it, the functions return the Python value None
).
comment:290 in reply to: ↑ 288 ; follow-up: ↓ 291 Changed 6 years ago by
Replying to jdemeyer:
Why did you change
if item_limb > bound
toif item > bound
? The former is a C comparison, therefore faster.
Sure. But you did item_limb = item
without testing whether item > bound
, which means that item_limb could end up below the bound. Hence, you'd have the random truncation that you wanted to avoid.
comment:291 in reply to: ↑ 290 Changed 6 years ago by
Replying to SimonKing:
But you did
item_limb = item
without testing whetheritem > bound
, which means that item_limb could end up below the bound.
Not true because the assignment item_limb = item
is a Python -> C conversion which is always checked:
sage: BoundedIntegerSequence(100, [2^256]) --------------------------------------------------------------------------- OverflowError Traceback (most recent call last) <ipython-input-6-ba7f864a5712> in <module>() ----> 1 BoundedIntegerSequence(Integer(100), [Integer(2)**Integer(256)]) /usr/local/src/sage-config/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.BoundedIntegerSequence.__init__ (build/cythonized/sage/data_structures/bounded_integer_sequences.c:7798)() /usr/local/src/sage-config/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.biseq_init_list (build/cythonized/sage/data_structures/bounded_integer_sequences.c:5706)() OverflowError: long int too large to convert
(perhaps add this as doctest)
comment:292 Changed 6 years ago by
I repeated the timings from commit:157, but still with the item<=bound
test (so, initialisation time might improve).
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: L0 = [randint(0,7) for i in range(5)] sage: L1 = [randint(0,15) for i in range(15)] sage: L2 = [randint(0,31) for i in range(50)] sage: L3 = [randint(0,31) for i in range(5000)] sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3) sage: S0 = BoundedIntegerSequence(8, L0) sage: S1 = BoundedIntegerSequence(16, L1) sage: S2 = BoundedIntegerSequence(32, L2) sage: S3 = BoundedIntegerSequence(32, L3)
sage: %timeit x = BoundedIntegerSequence(8, L0) 1000000 loops, best of 3: 1.45 µs per loop sage: %timeit x = BoundedIntegerSequence(16, L1) 100000 loops, best of 3: 2.1 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L2) 100000 loops, best of 3: 4.28 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L3) 1000 loops, best of 3: 300 µs per loop
--> Became much slower
sage: %timeit x = list(S0) 1000000 loops, best of 3: 1.44 µs per loop sage: %timeit x = list(S1) 100000 loops, best of 3: 2.18 µs per loop sage: %timeit x = list(S2) 100000 loops, best of 3: 4.29 µs per loop sage: %timeit x = list(S3) 1000 loops, best of 3: 286 µs per loop sage: %timeit x = S0.list() 1000000 loops, best of 3: 697 ns per loop sage: %timeit x = S1.list() 1000000 loops, best of 3: 1.2 µs per loop sage: %timeit x = S2.list() 100000 loops, best of 3: 2.78 µs per loop sage: %timeit x = S3.list() 10000 loops, best of 3: 135 µs per loop
--> not much conclusive
sage: timeit("x=S0[:-1]", number=100000) 100000 loops, best of 3: 815 ns per loop sage: timeit("x=S1[:-1]", number=100000) 100000 loops, best of 3: 846 ns per loop sage: timeit("x=S2[:-1]", number=100000) 100000 loops, best of 3: 845 ns per loop sage: timeit("x=S3[:-1]", number=100000) 100000 loops, best of 3: 1.62 µs per loop sage: timeit("x=S2[:-1:2]", number=10000) 10000 loops, best of 3: 1.55 µs per loop sage: timeit("x=S3[:-1:2]", number=10000) 10000 loops, best of 3: 49.8 µs per loop
--> Became much faster
sage: timeit("x=S0[1]", number=1000000) 1000000 loops, best of 3: 354 ns per loop sage: timeit("x=S0[4]", number=1000000) 1000000 loops, best of 3: 355 ns per loop sage: timeit("x=S3[1]", number=1000000) 1000000 loops, best of 3: 339 ns per loop sage: timeit("x=S3[4500]", number=1000000) 1000000 loops, best of 3: 346 ns per loop
--> Became perhaps slower.
What does that all mean?
For my applications not very much, actually, because what I care about is not initialisation from lists and direct item access and not slices with step different from one, but I do care about:
- Testing for subsequence containment, comparison and hash
- Slicing with step one
- Concatenation.
So, I leave it up to you whether clarity of code is more important than the above slow-down...
comment:293 follow-up: ↓ 295 Changed 6 years ago by
If you do timings, please keep in mind 286.
comment:294 Changed 6 years ago by
Right. When using item_limb
, the timings for initialisation from lists become good:
sage: %timeit x = BoundedIntegerSequence(8, L0) 1000000 loops, best of 3: 1.27 µs per loop sage: %timeit x = BoundedIntegerSequence(16, L1) 1000000 loops, best of 3: 1.51 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L2) 1000000 loops, best of 3: 1.88 µs per loop sage: %timeit x = BoundedIntegerSequence(32, L3) 10000 loops, best of 3: 65.5 µs per loop
comment:295 in reply to: ↑ 293 Changed 6 years ago by
comment:296 Changed 6 years ago by
- Commit changed from e9c779ae629dbec356bb8546e3974e2a67050051 to 7d258591bab37466eb6392443ed2e84cf699102c
comment:297 follow-up: ↓ 303 Changed 6 years ago by
I know it's really a detail, but I intentionally used repr(item)
to display the error message
raise ValueError("list item %r larger than %s"%(item, bound) )
This way, the error message refers to the list item as given by the user, which might be some strange Python type which prints differently.
comment:298 Changed 6 years ago by
Something else: the implementation of biseq_getitem_py
using PyInt_FromSize_t
gives a good reason to have biseq_item_t == size_t
. Because of this, I would propose to remove the biseq_item_t
type again and replace it by an explicit size_t
.
comment:299 Changed 6 years ago by
- Commit changed from 7d258591bab37466eb6392443ed2e84cf699102c to 8a10b734c7f81e44bb401c1fec2dc82bd0e98d5e
Branch pushed to git repo; I updated commit sha1. New commits:
8a10b73 | Use plain size_t, not biseq_item_t
|
comment:300 Changed 6 years ago by
- Commit changed from 8a10b734c7f81e44bb401c1fec2dc82bd0e98d5e to fa7cde09b93359371f236ce6da13bb8f5a947a08
Branch pushed to git repo; I updated commit sha1. New commits:
fa7cde0 | Use original data in error message
|
comment:301 Changed 6 years ago by
- Commit changed from fa7cde09b93359371f236ce6da13bb8f5a947a08 to 6e03f19d4bc2379e44e0b8adc32c13ac6e0bcabb
Branch pushed to git repo; I updated commit sha1. New commits:
6e03f19 | Code and speed improvement for biseq containment test
|
comment:302 Changed 6 years ago by
I have implemented mpn_equal_bits_shifted
. I do not use mpn_rshift
followed be mpn_cmp
, since this would mean to touch each limb twice (once
for shifting, once for comparison), even if the difference occurs early in the
sequence. Of course, using such function simplifies the code of
biseq_contains
and biseq_max_overlap
considerably.
Here I redo the containment tests from comment:157, giving timings with the new code and (in brackets) with the previous code (I would not directly compare with comment:157, since the previous code differs too much from the code back at comment:157).
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: L0 = [randint(0,7) for i in range(5)] sage: L1 = [randint(0,15) for i in range(15)] sage: L2 = [randint(0,31) for i in range(50)] sage: L3 = [randint(0,31) for i in range(5000)] sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3) sage: S0 = BoundedIntegerSequence(8, L0) sage: S1 = BoundedIntegerSequence(16, L1) sage: S2 = BoundedIntegerSequence(32, L2) sage: S3 = BoundedIntegerSequence(32, L3) sage: L0x = [randint(0,7) for i in range(5)] sage: L1x = [randint(0,15) for i in range(15)] sage: L2x = [randint(0,31) for i in range(50)] sage: L3x = [randint(0,31) for i in range(5000)] # verified that they differ from L0,L1,L2,L3 sage: S0x = BoundedIntegerSequence(8, L0x) sage: S1x = BoundedIntegerSequence(16, L1x) sage: S2x = BoundedIntegerSequence(32, L2x) sage: S3x = BoundedIntegerSequence(32, L3x) sage: S1y = BoundedIntegerSequence(16, L1) sage: S2y = BoundedIntegerSequence(32, L2) sage: S3y = BoundedIntegerSequence(32, L3) sage: S0z1 = S0+S0 sage: S0z2 = S0+S0x sage: S1z1 = S1+S1 sage: S1z2 = S1+S1x sage: S2z1 = S2+S2 sage: S2z2 = S2+S2x sage: S3z1 = S3+S3 sage: S3z2 = S3+S3x sage: timeit("S0x in S0z2", number=1000000) 1000000 loops, best of 3: 190 ns per loop (384 ns per loop) sage: timeit("S1x in S1z2", number=1000000) 1000000 loops, best of 3: 282 ns per loop (519 ns per loop) sage: timeit("S2x in S2z2", number=1000000) 1000000 loops, best of 3: 644 ns per loop (2.1 µs per loop) sage: timeit("S3x in S3z2", number=1000) 1000 loops, best of 3: 50.2 µs per loop (2.36 ms per loop)
So, there is a very clear improvement, and this time it concerns a function that I use very extensively in my applications.
Jeroen, I think that with the new commit I addressed all of your concerns, hence, for now I will move on to the follow-up tickets (i.e., cythoned quiver paths and the non-commutative F5 algorithm that is not on trac yet).
comment:303 in reply to: ↑ 297 Changed 6 years ago by
- Status changed from needs_review to needs_work
(never mind)
comment:304 Changed 6 years ago by
- Status changed from needs_work to needs_review
comment:305 Changed 6 years ago by
__iter__
and biseq_index
should be implemented in terms of biseq_getitem()
(or a good reason given why this is a bad idea).
comment:306 Changed 6 years ago by
- Status changed from needs_review to needs_work
comment:307 Changed 6 years ago by
In biseq_init_list
, iteration is about 10% faster the following way:
for item in data: item_limb = item index += 1
instead of
for index from 0<=index<R.length: item_limb = data[index]
comment:308 Changed 6 years ago by
biseq_clearitem
must be added to the .pxd
file
comment:309 follow-up: ↓ 320 Changed 6 years ago by
In biseq_slice
, the case step == 1
should also be implemented using bitset primitives.
comment:310 follow-up: ↓ 311 Changed 6 years ago by
Should I make these changes or will you? I'm just asking to avoid duplicate work.
comment:311 in reply to: ↑ 310 Changed 6 years ago by
Replying to jdemeyer:
Should I make these changes or will you? I'm just asking to avoid duplicate work.
I will try today.
comment:312 follow-up: ↓ 314 Changed 6 years ago by
While I am at it, I will also make biseq_max_overlap
, biseq_index
and biseq_contains
return mp_size_t
, not int
. After all, they return an index in a bounded integer sequence (or -1 if not found, or -2 on error), and I think we have agreed that indices should be mp_size_t
.
comment:313 follow-up: ↓ 315 Changed 6 years ago by
Currently, when calling B.index(i)
, i
is compared modulo the B.bound()
with the items of B
. Taking i
modulo the bound is currently done in biseq_index
. I suggest to move this to B.index
, since generally I want to avoid bound checking etc. in the boilerplate functions.
comment:314 in reply to: ↑ 312 Changed 6 years ago by
Replying to SimonKing:
While I am at it, I will also make
biseq_max_overlap
,biseq_index
andbiseq_contains
returnmp_size_t
, notint
. After all, they return an index in a bounded integer sequence (or -1 if not found, or -2 on error), and I think we have agreed that indices should bemp_size_t
.
I hadn't noticed, but yes: that should certainly be changed.
comment:315 in reply to: ↑ 313 ; follow-up: ↓ 318 Changed 6 years ago by
Replying to SimonKing:
Currently, when calling
B.index(i)
,i
is compared modulo theB.bound()
with the items ofB
. Takingi
modulo the bound is currently done inbiseq_index
. I suggest to move this toB.index
I would remove this modulo operation completely. If the item is too large or too small, treat it as not found.
comment:316 follow-up: ↓ 317 Changed 6 years ago by
In biseq_index
, use size_t
for item
.
comment:317 in reply to: ↑ 316 Changed 6 years ago by
Replying to jdemeyer:
In
biseq_index
, usesize_t
foritem
.
Since biseq_getitem
returns a size_t
, or is there another reason?
comment:318 in reply to: ↑ 315 Changed 6 years ago by
Replying to jdemeyer:
I would remove this modulo operation completely. If the item is too large or too small, treat it as not found.
OK, but I'll do this test in BoundedIntegerSequence.index
, not in biseq_index
.
comment:319 Changed 6 years ago by
Should I use biseq_getitem_py
both in BoundedIntegerSequence.__iter__
and BoundedIntegerSequence.list
? Or should one of them use biseq_getitem
for efficiency?
comment:320 in reply to: ↑ 309 Changed 6 years ago by
Replying to jdemeyer:
In
biseq_slice
, the casestep == 1
should also be implemented using bitset primitives.
Problem: We would need a bitset primitive that shifts a part of a bitset S. Currently, bitset_rshift
shifts all of S, not just a part of S.
I see four ways to proceed:
- Write a new primitive
bitset_rshift_subset
, which means code duplication - Add an optional argument to
bitset_rshift
, which means that all places using this function must be changed. - Change
bitset_rshift
so that no assumption on the size of source and target are needed: We just shift the minimum of the number of limbs that is available in the source and of the number of limbs that fits into the target. - Do not use bitset primitives in
biseq_slice
.
comment:321 follow-up: ↓ 326 Changed 6 years ago by
PS: My favourite would be to modify bitset_rshift
(and bitset_lshift
) to make it robust against reading/writing out of bound, in the sense that bitset_rshift(R,S,n)
moves a subset of S that fits into R.
comment:322 Changed 6 years ago by
- Commit changed from 6e03f19d4bc2379e44e0b8adc32c13ac6e0bcabb to b5b066dc218d7fb6e5e13eb19cce4720d4a2d057
Branch pushed to git repo; I updated commit sha1. New commits:
b5b066d | Code simplification for biseq_*, bound check for bitset shifts
|
comment:323 Changed 6 years ago by
- Commit changed from b5b066dc218d7fb6e5e13eb19cce4720d4a2d057 to 43f78ad50eee7259ab06d5eec1d4cdb03a43f50b
Branch pushed to git repo; I updated commit sha1. New commits:
43f78ad | Adding a reference to trac
|
comment:324 Changed 6 years ago by
In the new commits I addressed (I hope) all issues that we discussed yesterday and today.
Timings without the new commits:
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: B = BoundedIntegerSequence(27, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10]) sage: %timeit B.list() 1000000 loops, best of 3: 1.64 µs per loop sage: %timeit list(B) 100000 loops, best of 3: 2.57 µs per loop sage: %timeit B[2:-2] 1000000 loops, best of 3: 1.13 µs per loop sage: L = [randint(0,7) for _ in range(1000)] sage: %timeit B = BoundedIntegerSequence(8, L) 100000 loops, best of 3: 13 µs per loop sage: B = BoundedIntegerSequence(8, L) sage: %timeit B.list() 10000 loops, best of 3: 31.5 µs per loop sage: %timeit list(B) 10000 loops, best of 3: 55 µs per loop sage: %timeit B[3:800] 1000000 loops, best of 3: 1.26 µs per loop
And with the new commit:
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: B = BoundedIntegerSequence(27, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10]) sage: %timeit B.list() 1000000 loops, best of 3: 1.6 µs per loop sage: %timeit list(B) 100000 loops, best of 3: 2.62 µs per loop sage: %timeit B[2:-2] 1000000 loops, best of 3: 1.12 µs per loop sage: L = [randint(0,7) for _ in range(1000)] sage: %timeit B = BoundedIntegerSequence(8, L) 100000 loops, best of 3: 11.5 µs per loop sage: B = BoundedIntegerSequence(8, L) sage: %timeit B.list() 10000 loops, best of 3: 30.8 µs per loop sage: %timeit list(B) 10000 loops, best of 3: 64.9 µs per loop sage: %timeit B[3:800] 1000000 loops, best of 3: 1.27 µs per loop
Apparently, creation from a list became faster, iteration over a long list became 10% slower (but in my appliations, I do not iterate over the items of a bounded integer sequence, hence, I don't care about the slow-down), and everything else did not change significantly.
And the code became clearer, which is a plus. I hope you'll find that my
changes to bitset_rshift
and bitset_lshift
make sense.
comment:325 Changed 6 years ago by
- Status changed from needs_work to needs_review
comment:326 in reply to: ↑ 321 Changed 6 years ago by
Given the new code, the comment
Items will not just be compared by congruence modulo the bound of the sequence, but by equality.
no longer makes sense.
comment:327 Changed 6 years ago by
I would replace the doc
If the result of the shift would exceed the size of ``r``, then it will be cut.
by something like
There are no assumptions on the sizes of ``a`` and ``r``. Bits which are shifted outside of the resulting bitset are discarded.
comment:328 Changed 6 years ago by
In
raise ValueError("list item {} larger than {}".format(data[index], bound) )
replace data[index]
by item
.
comment:329 Changed 6 years ago by
- Commit changed from 43f78ad50eee7259ab06d5eec1d4cdb03a43f50b to 7a0dd469527c8a4f39d8d891703fcf5feecda9a1
Branch pushed to git repo; I updated commit sha1. New commits:
7a0dd46 | Some improvements of the doc
|
comment:330 Changed 6 years ago by
comment:331 Changed 6 years ago by
- Status changed from needs_review to needs_work
I found a bug (probably in slicing):
sage: B1 = BoundedIntegerSequence(8, [0,7]) sage: B2 = BoundedIntegerSequence(8, [2,1,4]) sage: B1[0:1]+B2 <0, 7, 1, 4> sage: B1[0:1] <0>
My diagnose is that the top bits of the slice (those exceeding the size) have not been cleared. This doesn't show when printing it, but concatenation does rely on the top bits being cleared.
comment:332 Changed 6 years ago by
- Commit changed from 7a0dd469527c8a4f39d8d891703fcf5feecda9a1 to 6723c7e44dbae3118e3dff259fd0e633915b53c9
Branch pushed to git repo; I updated commit sha1. New commits:
6723c7e | Fix highest bits of bitsets after fixing
|
comment:333 Changed 6 years ago by
- Status changed from needs_work to needs_review
comment:334 Changed 6 years ago by
For consistency, biseq_slice
should be renamed biseq_init_slice
.
comment:335 Changed 6 years ago by
- Reviewers changed from Simon King to Jeroen Demeyer, Simon King
Modulo this comment about biseq_init_slice
, I think the public biseq interface looks good.
I have not looked at all the changes of the last few days and I don't feel like doing that now. I do plan to review this ticket in some time with a fresh look (of course, other reviewers are always welcome). In any case, I think the current branch works well enough that you can use it for your other tickets.
comment:336 Changed 6 years ago by
- Commit changed from 6723c7e44dbae3118e3dff259fd0e633915b53c9 to b6ef6d5c5faffe4059b1539f00e25dd98b9481c0
Branch pushed to git repo; I updated commit sha1. New commits:
b6ef6d5 | Rename biseq_slice -> biseq_init_slice
|
comment:337 Changed 6 years ago by
In biseq_init_slice(biseq_t R, biseq_t S, mp_size_t start, mp_size_t stop, int step)
, the step
argument should also be mp_size_t
.
comment:338 Changed 6 years ago by
- Commit changed from b6ef6d5c5faffe4059b1539f00e25dd98b9481c0 to 0e51c4a8ae1720e301b4d57ee51289de475bf51d
Branch pushed to git repo; I updated commit sha1. New commits:
0e51c4a | Change the type of an argument
|
comment:339 Changed 6 years ago by
- Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
- Modified changed from 10/30/14 10:55:56 to 10/30/14 10:55:56
comment:340 Changed 6 years ago by
- Commit changed from 0e51c4a8ae1720e301b4d57ee51289de475bf51d to d46a22d394c0c3556bd297427614bbc43140efb3
Resolved easy merge conflict.
New commits:
d46a22d | Merge remote-tracking branch 'origin/develop' into ticket/15820
|
comment:341 Changed 6 years ago by
comment:342 Changed 6 years ago by
PS: The errors that some patchbots report seem unrelated. And I don't see what the coverage script is complaining about. Coverage of the new stuff is 100%.
comment:343 Changed 6 years ago by
- Commit changed from d46a22d394c0c3556bd297427614bbc43140efb3 to 7e5f21754e6c056d71609830729c093ba09b56a6
Branch pushed to git repo; I updated commit sha1. New commits:
7e5f217 | Reorganize logic of bitset shifts
|
comment:344 Changed 6 years ago by
- Commit changed from 7e5f21754e6c056d71609830729c093ba09b56a6 to 0c646186f3522ab9bf3210ddc9bd3baf7b1b4664
comment:345 Changed 6 years ago by
- Commit changed from 0c646186f3522ab9bf3210ddc9bd3baf7b1b4664 to 58f5f57bd3d620b571ce2c8d1e8da372f5171fdf
Branch pushed to git repo; I updated commit sha1. New commits:
93e64fd | Always raise OverflowError if list item is out of bounds
|
comment:346 Changed 6 years ago by
- Commit changed from 58f5f57bd3d620b571ce2c8d1e8da372f5171fdf to 411fe4e04b87e5fdf87e4c3f4052e6747e09eca8
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
411fe4e | Simplify/fix the logic of some biseq functions
|
comment:347 Changed 6 years ago by
- Commit changed from 411fe4e04b87e5fdf87e4c3f4052e6747e09eca8 to 8db9f6542a54ead67c1df3d069103b620e85eea6
comment:348 Changed 6 years ago by
My review of this is done now. I'll let you look at my commits.
comment:349 Changed 6 years ago by
I have made one important functional change: you should change biseq_max_overlap(A, B)
to biseq_reverse_contains(B, A, 1)
in the follow-up tickets.
I also fixed a few bugs and changed some exceptions but there should be no other functional changes.
comment:350 follow-up: ↓ 351 Changed 6 years ago by
- Status changed from needs_review to needs_work
- Work issues set to interrupt handling
We are still missing sig_*
functions to allow interrupts. But I prefer for you to review my changes before adding those.
comment:351 in reply to: ↑ 350 ; follow-up: ↓ 358 Changed 6 years ago by
Dear Jeroen,
thank you for reviewing!
Replying to jdemeyer:
We are still missing
sig_*
functions to allow interrupts. But I prefer for you to review my changes before adding those.
- You changed the location of the (old) function
bitset_fix
. Why? Won't that increase the likelihood of a merge conflict?
- When reading the function name
biseq_reverse_contains
, I'd expect that it either checks for containment of a reverse subsequence, or it checks for containment of a subsequence starting at the end of the supersequence. Both is not the case. What this function determines is an overlap of the end of first and the beginning of the second argument. That said, I don't think the new function name is worse than the old.
The other changes seem at least reasonable, some are needed. Thanks for spotting it.
In any case, there will be a merge conflict with the next follow-up ticket. But I guess it can hardly be avoided. I think we can now take care of signal handling.
comment:352 Changed 6 years ago by
And one thing is not correct in the description of biseq_reverse_contains
:
Return ``i`` such that the bounded integer sequence ``S1`` starts with the sequence ``S2[i:]``, where ``start <= i < S2.length``, or return ``-1`` if no such ``i`` exists.
If you apply the function (whatever it is named) to <2,3,2,3,1>
and <1,2,3,2,3>
(I just notice that you also exchanged the rôles of S1
and S2
!), then I want it to yield the maximal overlap: It should return 1, not 3. That's not clear from the new description.
comment:353 Changed 6 years ago by
Hmmmm. I am less and less happy about the "contains" in the name biseq_reverse_contains
. Neither is the first argument contained in the second nor the second contained in the first. And even if it were: We are not interested in mere containment, but we want to know whether a terminal segment of one argument coincides with an initial segment of the other argument.
comment:354 follow-ups: ↓ 359 ↓ 364 Changed 6 years ago by
What do you think about the name biseq_start_of_overlap
, and switching back to the old rôle of the arguments (i.e., we search for the smallest possible start index of a terminal segment of S1 that coincides with an initial segment of S2)?
comment:355 Changed 6 years ago by
- Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820
comment:356 Changed 6 years ago by
- Commit changed from 8db9f6542a54ead67c1df3d069103b620e85eea6 to 63d2693eb4915632e3cb806730a4eb56dc26008e
For the record: With git trac push 15820
, only the branch of this ticket was changed, but not the sha1. Hence, I have to change it manually.
comment:357 Changed 6 years ago by
By the way: Note that I've formulated error messages such as ValueError: BoundedIntegerSequence.index(x): x(=32) not in sequence
by taking tuples as good example:
sage: (1,2,3).index(4) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-2-e2928ab3b336> in <module>() ----> 1 (Integer(1),Integer(2),Integer(3)).index(Integer(4)) ValueError: tuple.index(x): x not in tuple
Your error message is more similar to what is done for lists:
sage: [1,2,3].index(4) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-1-e163a1d1b274> in <module>() ----> 1 [Integer(1),Integer(2),Integer(3)].index(Integer(4)) ValueError: 4 is not in list
But that's not more than a side-note. I accept your change.
comment:358 in reply to: ↑ 351 Changed 6 years ago by
Replying to SimonKing:
- You changed the location of the (old) function
bitset_fix
. Why?
Because I think the new location is better than the old one, it seemed out of place in the list of functions to generate limb patterns.
Won't that increase the likelihood of a merge conflict?
Well, every change increases the likelihood of a merge conflict... But I don't know of any other tickets changing bitset code, so it won't be that bad.
comment:359 in reply to: ↑ 354 Changed 6 years ago by
Replying to SimonKing:
What do you think about the name
biseq_start_of_overlap
, and switching back to the old rôle of the arguments (i.e., we search for the smallest possible start index of a terminal segment of S1 that coincides with an initial segment of S2)?
Well, the thing that bothered me most about the old biseq_max_overlap
function was the implicit start index of 1, which is unexpected (why 1?) and couldn't be changed (unlike biseq_contains
).
I changed the names and ordering of the arguments to make it more clear that biseq_contains
and biseq_reverse_contains
are in fact very similar: the first finds i
such that S1[i:]
starts with S2
, the second finds i
such that S1
starts with S2[i:]
.
I'm certainly open for changes, I just would like the functions to be as intuitive as possible for people not knowing about your application of quiver algebras. For example, the word "overlap" doesn't mean anything to me when you talk just about sequences of integers.
comment:360 Changed 6 years ago by
- Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
- Modified changed from 12/05/14 09:28:15 to 12/05/14 09:28:15
comment:361 Changed 6 years ago by
Apparently the trac pages are not available, because of an internal error. So, I'm trying to comment using the sage-dev script.
You wrote:
I'm certainly open for changes, I just would like the functions to be as intuitive as possible for people not knowing about your application of quiver algebras. For example, the word "overlap" doesn't mean anything to me when you talk just about sequences of integers.
Here I disagree. If I have two lists A and B, then the statement "A overlaps with B" makes immediate sense to me. The only open question is whether "A overlaps with B" means that the overlap of the two lists is supposed to be at the end of A or at the end of B.
comment:362 Changed 6 years ago by
- Commit changed from 63d2693eb4915632e3cb806730a4eb56dc26008e to 0a03d71d9df26c42cf91dab3df51ca549c434718
New commits:
0a03d71 | More descriptive function name for bounded integer sequences
|
comment:363 Changed 6 years ago by
To reduce unneeded merges, I have cherry-picked your commit on top of my branch.
comment:364 in reply to: ↑ 354 Changed 6 years ago by
Replying to SimonKing:
What do you think about switching back to the old rôle of the arguments
I think it's a bad idea.
Look at the following (now removed) code from you:
if other.startswith(self): return self cdef mp_size_t i = biseq_max_overlap(self.data, other.data)
Note that the case i == 0
of biseq_max_overlap(B, A)
corresponds to biseq_startswith(A, B)
. An unexpected reordering in my opinion.
In my proposal, the three functions biseq_startswith
, biseq_contains
and biseq_reverse_contains
(whatever the name) all check whether some part of S1
starts with some part of S2
. The only thing that changes is which parts of S1
and S2
are considered.
comment:365 Changed 6 years ago by
Better proposal: rename biseq_reverse_contains
as biseq_startswith_tail
. I think that's clear: check whether a sequence starts with the tail part of a sequence.
comment:366 Changed 6 years ago by
- Commit changed from 0a03d71d9df26c42cf91dab3df51ca549c434718 to 1560ce8dff16ed1fceabcc5656312f718d874211
Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:
1560ce8 | Rename biseq_reverse_contains as biseq_startswith_tail
|
comment:367 Changed 6 years ago by
OK, I am fine with that change.
comment:368 Changed 6 years ago by
Thanks for the support!
For me, this ticket is positive_review except for the interrupt stuff.
comment:369 Changed 6 years ago by
Since yesterday evening, I tried for about 20 times to push my changes (implementing interrupt stuff). But it doesn't work. Any idea why?
As usual, the error message is not very helpful:
> git trac push 15820 Pushing to Trac #15820... Guessed remote branch: u/SimonKing/ticket/15820 Enter passphrase for key '/home/king/.ssh/id_rsa': Traceback (most recent call last): File "/home/king/bin/git-trac", line 18, in <module> cmdline.launch() File "/home/king/Sage/git/git-trac-command/git_trac/cmdline.py", line 210, in launch app.push(ticket_number, remote=args.remote, force=args.force) File "/home/king/Sage/git/git-trac-command/git_trac/app.py", line 194, in push self.repo.push(remote, force) File "/home/king/Sage/git/git-trac-command/git_trac/git_repository.py", line 181, in push self.git.echo.push('trac', refspec) File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 341, in meth return self.execute(git_cmd, *args, **kwds) File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 98, in execute popen_stderr=subprocess.PIPE) File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 263, in _run raise GitError(result) git_trac.git_error.GitError
Is that a problem on my side, or on trac?
Anyway, the diff would be
-
src/sage/data_structures/bounded_integer_sequences.pyx
diff --git a/src/sage/data_structures/bounded_integer_sequences.pyx b/src/sage/data_structures/bounded_integer_sequences.pyx index 7f5e632..ccda54c 100644
a b cdef bint biseq_init(biseq_t R, mp_size_t l, mp_bitcnt_t itemsize) except -1: 124 124 totalbitsize = l * itemsize 125 125 else: 126 126 totalbitsize = 1 127 sig_on() 127 128 bitset_init(R.data, totalbitsize) 129 sig_off() 128 130 R.length = l 129 131 R.itembitsize = itemsize 130 132 R.mask_item = limb_lower_bits_up(itemsize) … … cdef bint biseq_init_copy(biseq_t R, biseq_t S) except -1: 140 142 Initialize ``R`` as a copy of ``S``. 141 143 """ 142 144 biseq_init(R, S.length, S.itembitsize) 145 sig_on() 143 146 bitset_copy(R.data, S.data) 147 sig_off() 144 148 145 149 # 146 150 # Conversion … … cdef bint biseq_init_list(biseq_t R, list data, size_t bound) except -1: 162 166 163 167 biseq_init(R, len(data), BIT_COUNT(bound|<size_t>1)) 164 168 169 sig_check() 165 170 for item in data: 166 171 item_c = item 167 172 if item_c > bound: … … cdef bint biseq_init_concat(biseq_t R, biseq_t S1, biseq_t S2) except -1: 187 192 The result is written into ``R``, which must not be initialised 188 193 """ 189 194 biseq_init(R, S1.length + S2.length, S1.itembitsize) 195 sig_on() 190 196 bitset_lshift(R.data, S2.data, S1.length * S1.itembitsize) 191 197 bitset_or(R.data, R.data, S1.data) 198 sig_off() 192 199 193 200 194 201 cdef inline bint biseq_startswith(biseq_t S1, biseq_t S2) except -1: … … cdef inline bint biseq_startswith(biseq_t S1, biseq_t S2) except -1: 207 214 return False 208 215 if S2.length == 0: 209 216 return True 217 sig_check() 210 218 return mpn_equal_bits(S1.data.bits, S2.data.bits, S2.data.size) 211 219 212 220 … … cdef bint biseq_init_slice(biseq_t R, biseq_t S, mp_size_t start, mp_size_t stop 304 312 305 313 if step == 1: 306 314 # Slicing essentially boils down to a shift operation. 315 sig_on() 307 316 bitset_rshift(R.data, S.data, start*S.itembitsize) 317 sig_off() 308 318 return 0 309 319 310 320 # In the general case, we move item by item. … … cdef mp_size_t biseq_contains(biseq_t S1, biseq_t S2, mp_size_t start) except -2 340 350 if S2.length == 0: 341 351 return start 342 352 cdef mp_size_t index 353 sig_check() 343 354 for index from start <= index <= S1.length-S2.length: 344 355 if mpn_equal_bits_shifted(S2.data.bits, S1.data.bits, 345 356 S2.length*S2.itembitsize, index*S2.itembitsize): … … cdef mp_size_t biseq_startswith_tail(biseq_t S1, biseq_t S2, mp_size_t start) ex 373 384 if S1.length < S2.length - start: 374 385 start = S2.length - S1.length 375 386 cdef mp_size_t index 387 sig_check() 376 388 for index from start <= index < S2.length: 377 389 if mpn_equal_bits_shifted(S1.data.bits, S2.data.bits, 378 390 (S2.length - index)*S2.itembitsize, index*S2.itembitsize): … … cpdef BoundedIntegerSequence NewBISEQ(tuple bitset_data, mp_bitcnt_t itembitsize 1335 1347 cdef BoundedIntegerSequence out = BoundedIntegerSequence.__new__(BoundedIntegerSequence) 1336 1348 # bitset_unpickle assumes that out.data.data is initialised. 1337 1349 biseq_init(out.data, length, itembitsize) 1350 sig_on() 1338 1351 bitset_unpickle(out.data.data, bitset_data) 1352 sig_off() 1339 1353 return out 1340 1354 1341 1355 def _biseq_stresstest():
comment:370 Changed 6 years ago by
I can't help with git...
comment:371 Changed 6 years ago by
- Commit changed from 1560ce8dff16ed1fceabcc5656312f718d874211 to 4e9e1c51f6524bfa2d2c55d197b82a983dd01184
Branch pushed to git repo; I updated commit sha1. New commits:
4e9e1c5 | Add interrupt handling to bounded integer sequences
|
comment:372 Changed 6 years ago by
- Status changed from needs_work to needs_review
- Work issues interrupt handling deleted
comment:373 follow-up: ↓ 378 Changed 6 years ago by
Questions:
- Is the stresstest really running forever?? It is supposed to take few seconds. I will test in a few minutes.
- When should one use
sig_check()
and whensig_on()/sig_off()
? I thought it issig_check()
when it could happen that an error or a return may happen that is not caused by a keyboard interrupt. But perhaps I am mistaken.
comment:374 follow-up: ↓ 375 Changed 6 years ago by
Aha, I see! You really changed the stress test so that it would run forever, but test that it can be interrupted. Good idea!
comment:375 in reply to: ↑ 374 ; follow-up: ↓ 377 Changed 6 years ago by
Replying to SimonKing:
Aha, I see! You really changed the stress test so that it would run forever, but test that it can be interrupted. Good idea!
Unfortunately it does NOT test that bounded integer sequence operations are interruptible. I tested a couple of times, and the exception always occurs in the randint function.
comment:376 Changed 6 years ago by
Should we address the interrupt test in a different way? If not, I think it is a positive review from my side. But I'd appreciate if you could answer my question on sig_on/sig_off vs. sig_check.
comment:377 in reply to: ↑ 375 Changed 6 years ago by
Replying to SimonKing:
Unfortunately it does NOT test that bounded integer sequence operations are interruptible. I tested a couple of times, and the exception always occurs in the randint function.
I also tested a bit more and the exception indeed happens most often in randint()
but also sometimes in other places.
comment:378 in reply to: ↑ 373 Changed 6 years ago by
Replying to SimonKing:
- When should one use
sig_check()
and whensig_on()/sig_off()
?
Does http://www.sagemath.org/doc/developer/coding_in_cython.html#interrupt-and-signal-handling answer your question?
comment:379 follow-up: ↓ 380 Changed 6 years ago by
Should we add tests in the spirit of this:
sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence sage: B = BoundedIntegerSequence(4,[1,2,3,2,3,2,3]) sage: while 1: ....: C = B+B ....: ^C--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-3-dda173c83bbe> in <module>() ----> 1 while Integer(1): 2 C = B+B 3 /home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sage_python_check_interrupt (build/cythonized/sage/ext/c_lib.c:1683)() /home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sig_raise_exception (build/cythonized/sage/ext/c_lib.c:769)() KeyboardInterrupt: sage: while 1: C = B+B ....: ^C--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-4-dda173c83bbe> in <module>() 1 while Integer(1): ----> 2 C = B+B 3 /home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.BoundedIntegerSequence.__add__ (build/cythonized/sage/data_structures/bounded_integer_sequences.c:10964)() /home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.biseq_init_concat (build/cythonized/sage/data_structures/bounded_integer_sequences.c:7632)() /home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sig_raise_exception (build/cythonized/sage/ext/c_lib.c:769)() KeyboardInterrupt:
As you can see, the interruption occurs in the correct place already in the second try.
comment:380 in reply to: ↑ 379 ; follow-up: ↓ 381 Changed 6 years ago by
Replying to SimonKing:
Should we add tests in the spirit of this:
We could, but I'm not sure it is worth the trouble.
comment:381 in reply to: ↑ 380 Changed 6 years ago by
- Status changed from needs_review to positive_review
comment:382 Changed 6 years ago by
- Branch changed from u/jdemeyer/ticket/15820 to 4e9e1c51f6524bfa2d2c55d197b82a983dd01184
- Resolution set to fixed
- Status changed from positive_review to closed
comment:383 follow-up: ↓ 384 Changed 6 years ago by
- Commit 4e9e1c51f6524bfa2d2c55d197b82a983dd01184 deleted
I just saw this ticket on the trac timeline, i wonder whether two similar objects are being developed in parallel by disjoint sets of people, see #17013.
comment:384 in reply to: ↑ 383 Changed 6 years ago by
Replying to tmonteil:
I just saw this ticket on the trac timeline, i wonder whether two similar objects are being developed in parallel by disjoint sets of people, see #17013.
Indeed there is an obvious overlap. If I understand correctly, #17013 uses char *
as underlying data structure, which means they are restricted to an alphabet of length at most 255. I have experimented with a similar implementation, too, but found that an implementation based on gmp gives better speed than what I was able to obtain with char*
.
There are various possible more or less obvious data formats:
char*
orunsigned int*
, hence, as a list of integers, each integer corresponding to one edge of the quiver.char*
would bound the total number of edges of the quiver to 256 (which might be too small), whereasunsigned int*
might consume to much memory.char*
, each integer corresponding to an outgoing edge of a vertex. Hence, the total number of edges may be arbitrary, we only have a local bound of 128 on the number of outgoing edges---this hopefully is enough (it certainly is enough for me).In 2. and 4. (and probably in 1. and 3. as well) we should explicitly store the start and end point of the path. The length should probably be stored as well.
In my applications, the following operations on paths are needed to be fast: Concatenation, hash/comparison, testing whether a path p starts with a path q, and less important whether a path p contains a path q.
Concatenation is particularly easy in 1. and 2. (just concatenation of lists, which is achieved by memcopy). It is a bit more difficult in 3. and 4., where the bytes of the second factor need to be shifted to get the compressed data seamless.
Hash and comparison are easy in all four formats, but of course best speed would be obtained in 4., simply since the amount of data that needs to be processed is smaller than in 1.-3.
In all four proposed data formats, it is easy to test whether path p starts with path q. Again, having the data stored in as little memory as possible will speed things up.
It is easy in 1. to test whether path p contains path q (as a subpath that is not necessarily an initial segment), whereas this becomes a bit more difficult in 2.-4.
What format would you suggest? Perhaps it makes sense to have some initial implementations of all four, and then compare directly.