Opened 4 years ago

Closed 3 years ago

Last modified 3 years ago

#15820 closed enhancement (fixed)

Implement sequences of bounded integers

Reported by: SimonKing Owned by:
Priority: major Milestone: sage-6.4
Component: algebra Keywords: sequence bounded integer
Cc: JStarx, jhpalmieri, was, stumpc5, saliola, SimonKing, gmoose05, nthiery, virmaux, ncohen, hivert Merged in:
Authors: Simon King, Jeroen Demeyer Reviewers: Jeroen Demeyer, Simon King
Report Upstream: N/A Work issues:
Branch: 4e9e1c5 (Commits) Commit:
Dependencies: #17195, #17196 Stopgaps:

Description (last modified by SimonKing)

#12630 has introduced tools to compute with quiver algebras and quiver representations. It is all Python code, paths being internally stored by lists of labelled edges of digraphs.

The aim of this ticket is to provide an efficient implementation of lists of bounded integers, with concatenation. This can be used to implement paths of a quiver more efficiently, but can also have different applications (say, elements of free groups, or other applications in sage-combinat).

Attachments (1)

path_test.pyx (27.3 KB) - added by SimonKing 4 years ago.
A proof of concept

Download all attachments as: .zip

Change History (385)

comment:1 Changed 4 years ago by SimonKing

There are various possible more or less obvious data formats:

  1. Store a path as char* or unsigned int*, hence, as a list of integers, each integer corresponding to one edge of the quiver. char* would bound the total number of edges of the quiver to 256 (which might be too small), whereas unsigned int* might consume to much memory.
  2. Store a path as char*, each integer corresponding to an outgoing edge of a vertex. Hence, the total number of edges may be arbitrary, we only have a local bound of 128 on the number of outgoing edges---this hopefully is enough (it certainly is enough for me).
  3. As in 1., but the representation is compressed: If we have less than 128 edges, then one byte can represent a pair of edges, and a single 32- or 64-bit int could store even longer sequences of edges.
  4. As in 2., but the representation is compressed: If each vertex has less than 8 outgoing edges, then a byte can represent a pair and a single 32-bit word can represent a 10-tuple of edges (if I did not miscalculate).

In 2. and 4. (and probably in 1. and 3. as well) we should explicitly store the start and end point of the path. The length should probably be stored as well.

In my applications, the following operations on paths are needed to be fast: Concatenation, hash/comparison, testing whether a path p starts with a path q, and less important whether a path p contains a path q.

Concatenation is particularly easy in 1. and 2. (just concatenation of lists, which is achieved by memcopy). It is a bit more difficult in 3. and 4., where the bytes of the second factor need to be shifted to get the compressed data seamless.

Hash and comparison are easy in all four formats, but of course best speed would be obtained in 4., simply since the amount of data that needs to be processed is smaller than in 1.-3.

In all four proposed data formats, it is easy to test whether path p starts with path q. Again, having the data stored in as little memory as possible will speed things up.

It is easy in 1. to test whether path p contains path q (as a subpath that is not necessarily an initial segment), whereas this becomes a bit more difficult in 2.-4.

What format would you suggest? Perhaps it makes sense to have some initial implementations of all four, and then compare directly.

comment:2 follow-up: Changed 4 years ago by nthiery

I'd say:

  • Some implementation without limitations (the one from the previous ticket I guess)
  • Whichever of 1-4 that covers your needs (256 edges for Gröbner bases calculations this should be already quite big, right?)
  • As much as possible of the code shared between all implementations to make it easy to add new ones if deemed useful.

But that's just my 2 cents ... I don't have specific experience!

Cheers,

comment:3 in reply to: ↑ 2 Changed 4 years ago by SimonKing

Hi Nicolas,

Replying to nthiery:

  • Some implementation without limitations (the one from the previous ticket I guess)

An implementation which represents edges by unsigned long long can be considered "unlimited".

  • Whichever of 1-4 that covers your needs (256 edges for Gröbner bases calculations this should be already quite big, right?)

Yes, but I want a general code.

  • As much as possible of the code shared between all implementations to make it easy to add new ones if deemed useful.

Actually, on the level of paths, the implementations come in pairs: The compressed representations differ from the non-compressed---at the end of the day, we just have different ways of representing sequences of integers. Only when it comes to interpreting a sequence of integers as a path in a quiver will we have further differences: Does "[1,2]" mean "start with arrow 1 of the quiver and continue with arrow 2 of the quiver", or does it mean "travel along the outgoing arrow 1 of the vertex we are currently located at, and then continue with outgoing arrow 2 of the vertex we just arrived at".

Currently, I prefer a compressed representation by unsigned long long*. Rationale:

  • This will work even for extremely large quivers---the only restriction is, that the number of edges can be represented as an unsigned long long (a larger quiver won't fit into computer's memory anyway).
  • For small quivers, even paths of length 10 will easily fit into a single unsigned long long. That's quite efficient, both memory- and speed-wise.

A bit later I will post experimental code (not yet talking about quiver paths but about sequences of integers) giving some evidence on efficiency. And then we can still see how we interpret the integer sequences: Having a global or a local list of arrows?

comment:4 Changed 4 years ago by SimonKing

I have attached a toy (or proof-of-concept) implementation; see path_test.pyx. In a real implementation in compressed representation, it would be the job of a PathSemigroup to find out how many edges fit into one word. And of course, in the toy implementation I am not trying to translate an integer sequence into the string representation of an element of the path semigroup.

What implementation would you suggest to use in "reality"? Is there another implementation of "sequences of bounded integers" that you would recommend? And would you enumerate the arrows of a quiver locally (i.e., only number the outgoing edges at each vertex) or globally? The former will result in better compression, but would make it more difficult to test for paths p,q whether there are paths r,s with p==r*q*s.

Now for the timings, and sorry for the long post...

Here, I am timing concatenation, hash, comparison, comparing initial segments. In all examples, we use the following paths of various sizes.

p5 = Path(ZZ, 5, range(5), 1, 1)
p25 = Path(ZZ, 25, range(25), 1, 1)
p50 = Path(ZZ, 50, range(50), 1, 1)
p100 = Path(ZZ, 100, range(100), 1, 1)
p1000 = Path(ZZ, 1000, range(1000), 1, 1)
q5 = Path(ZZ, 5, range(1,6), 1, 1)
q25 = Path(ZZ, 25, range(1,26), 1, 1)
q50 = Path(ZZ, 50, range(1,51), 1, 1)
q100 = Path(ZZ, 100, range(1,101), 1, 1)
q1000 = Path(ZZ, 1000, range(1,1001), 1, 1)
r5 = Path(ZZ, 5, range(1,5)+[0], 1, 1)
r25 = Path(ZZ, 25, range(1,25)+[0], 1, 1)
r50 = Path(ZZ, 50, range(1,50)+[0], 1, 1)
r100 = Path(ZZ, 100, range(1,100)+[0], 1, 1)
r1000 = Path(ZZ, 1000, range(1,1000)+[0], 1, 1)
s5 = Path(ZZ, 5, range(5), 1, 1)
s25 = Path(ZZ, 25, range(25), 1, 1)
s50 = Path(ZZ, 50, range(50), 1, 1)
s100 = Path(ZZ, 100, range(100), 1, 1)
s1000 = Path(ZZ, 1000, range(1000), 1, 1)

We are doing the following tests.

Concatenation:

%timeit p5*q5
%timeit p5*q25
%timeit p25*q25
%timeit p25*q50
%timeit p50*q50
%timeit p50*q100
%timeit p100*q100
%timeit p100*q1000
%timeit p1000*q1000

Hash:

%timeit hash(p5)
%timeit hash(p25)
%timeit hash(p50)
%timeit hash(p100)
%timeit hash(p1000)

Comparison:

  1. equal
    %timeit p5==s5
    %timeit p25==s25
    %timeit p50==s50
    %timeit p100==s100
    %timeit p1000==s1000
    
  2. obviously different (first item differs)
    %timeit p5==r5
    %timeit p25==r25
    %timeit p50==r50
    %timeit p100==r100
    %timeit p1000==r1000
    
  3. less obviously different (only the last item differs)
    %timeit q5==r5
    %timeit q25==r25
    %timeit q50==r50
    %timeit q100==r100
    %timeit q1000==r1000
    

Startswith:

%timeit q1000.startswith(q100)  # it does start with
%timeit q1000.startswith(r100)  # it doesn't start with

Uncompressed integer lists

Here, we define Path=Path_v1 (see path_test.pyx).

Using ctypedef unsigned long block

sage: %timeit p5*q5
1000000 loops, best of 3: 739 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 769 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 801 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 810 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 885 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 959 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 1.02 us per loop
sage: %timeit p100*q1000
100000 loops, best of 3: 1.99 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 3.06 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 135 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 166 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 214 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 333 ns per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 2.09 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 1.63 us per loop
sage: %timeit p25==s25
100000 loops, best of 3: 5.91 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 11 us per loop
sage: %timeit p100==s100
10000 loops, best of 3: 23 us per loop
sage: %timeit p1000==s1000
1000 loops, best of 3: 212 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 634 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 659 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 659 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 686 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 698 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.65 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 5.83 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 11.5 us per loop
sage: %timeit q100==r100
10000 loops, best of 3: 22 us per loop
sage: %timeit q1000==r1000
1000 loops, best of 3: 213 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 325 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 311 ns per loop

Using ctypedef unsigned short block

sage: %timeit p5*q5
1000000 loops, best of 3: 700 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 738 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 820 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 769 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 808 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 877 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 909 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.49 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 2.13 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 138 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 179 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 236 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 383 ns per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 2.61 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 1.08 us per loop
sage: %timeit p25==s25
100000 loops, best of 3: 3.42 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 6.53 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 14.9 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 137 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 613 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 591 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 602 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 603 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 635 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.15 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 3.63 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 6.45 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 12.8 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 140 us per loop
sage: q1000.startswith(q100)
True
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 336 ns per loop
sage: q1000.startswith(r100)
False
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 325 ns per loop

Using ctypedef unsigned char block

sage: %timeit p5*q5
1000000 loops, best of 3: 679 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 693 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 725 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 760 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 772 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 761 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 790 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.1 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.31 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 133 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 176 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 234 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 374 ns per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 2.53 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 1.08 us per loop
sage: %timeit p25==s25
100000 loops, best of 3: 3.36 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 6.3 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 12.5 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 121 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 581 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 589 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 599 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 599 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 636 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.06 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 3.46 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 6.51 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 12.7 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 121 us per loop
sage: q1000.startswith(q100)
True
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 324 ns per loop
sage: %timeit q1000.startswith(r100) 
1000000 loops, best of 3: 315 ns per loop

Compressed integer lists

Here, we define Path=Path_v2 (see path_test.pyx), and do

sage: SizeManager.set_edge_number(27)

so that there is a total of only 27 edges (resp. 27 is the maximal number of outgoing arrows on a vertex). In particular, all integers in the sequences below are taken "mod 27". Since 27 has length 5 bits, a compressed representation using unsigned char* won't make sense. Hence, I am only testing long and short.

Using ctypedef unsigned long block

sage: %timeit p5*q5
1000000 loops, best of 3: 676 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 671 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 683 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 725 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 759 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 787 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 811 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.44 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.61 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 132 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 134 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 142 ns per loop
sage: %timeit hash(p100)
10000000 loops, best of 3: 155 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 466 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 710 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.71 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.6 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 4.4 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 38 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 692 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 715 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 724 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 729 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 752 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 706 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 1.69 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 2.59 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 4.4 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 37.6 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 272 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 257 ns per loop

Using ctypedef unsigned short block

sage: %timeit p5*q5
1000000 loops, best of 3: 718 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 682 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 710 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 740 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 722 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 781 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 858 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.98 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 2.17 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 132 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 148 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 168 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 206 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 951 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 723 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.77 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.93 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 5.32 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 47.5 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 577 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 586 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 599 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 588 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 607 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 754 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 1.87 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 2.89 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 5.28 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 47.3 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 272 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 256 ns per loop

comment:5 Changed 4 years ago by SimonKing

It turns out that there are bugs in the toy code. But that's no surprise...

comment:6 Changed 4 years ago by SimonKing

I have fixed the problems in path_test.pyx.

It was:

  1. bitshift by a wrong offset, and
  2. the bits that aren't used (e.g., the last 4 bits of an uint64_t when storing 12 arrows of size 5 bits) should be kept blank.

The latter involves an additional bitwise "and". So, probably the timings will slightly deteriorate.

comment:7 Changed 4 years ago by SimonKing

There is one other variation that I want to test.

Currently, if the number of edges fits into 5 bits, then a 64 bit word is filled with data of 12 edges, plus 4 bits of garbage. The garbage must always be zero bits, for otherwise the shift operations would pollute real data with the garbage bits. To keep the garbage zero involves additional operations.

Alternatively, one could try to put fewer arrows into one word, so that there are no garbage bits. This could be done by encoding each arrow by a number of bits that is a power of 2. Hence, in the setting above, one would encode each edge by 8 rather than 5 bits, fitting 8 arrows into one 64 bit word. It is less dense, however the code gets simpler and should have less overhead.

comment:8 Changed 4 years ago by SimonKing

I have replaced path_test.pyx again, adding two more implementations.

Do you have further suggestions on storage of lists of bounded integers? If you haven't, then I'll try to assess the timings stated in the various comments, according to what I expect to be useful in my applications.

Here are details on the two new implementations:

In Path_v3, I use Sage's Integer class as bit list. Sounds crazy, but it works rather well. I assume that bitshifts and comparison (and that's what we are using here) are internally written in assembler, and for us, the additional benefit is that we do not need to take care about the size of blocks (8 bit? 32 bit? 64 bit? 128 bit?) in which we store stuff---Integer does it for us. Perhaps the timings could be improved further by using the underlying c-library (GMP's mpz_t) directly.

In Path_v4, I provide a slightly simplified version of Path_v2: The data is compressed, but so that the chunks used to store one arrow fit seamlessly into a memory block. Hence, when we use 64-bit blocks, and have 27 arrows, then Path_v2 would store the arrows in chunks of 5 bit (hence, it can store 12 arrows in one memory block, leaving 4 bit empty), whereas Path_v4 would store the arrows in chunks of 8 bit (hence, it can only store 8 arrows in one memory block, but since this fits neatly into one memory block, the code can be simplified).

The timings for Path_v3:

sage: %timeit p5*q5
1000000 loops, best of 3: 645 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 631 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 635 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 1.23 us per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 1.82 us per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 1.83 us per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 1.37 us per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.82 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.78 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 164 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 197 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 210 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 274 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 1.94 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 667 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 638 ns per loop
sage: %timeit p50==s50
1000000 loops, best of 3: 693 ns per loop
sage: %timeit p100==s100
1000000 loops, best of 3: 658 ns per loop
sage: %timeit p1000==s1000
1000000 loops, best of 3: 817 ns per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 746 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 723 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 769 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 731 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 733 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 710 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 733 ns per loop
sage: %timeit q50==r50
1000000 loops, best of 3: 718 ns per loop
sage: %timeit q100==r100
1000000 loops, best of 3: 719 ns per loop
sage: %timeit q1000==r1000
1000000 loops, best of 3: 755 ns per loop
sage: %timeit q1000.startswith(q100)
100000 loops, best of 3: 2.46 us per loop
sage: %timeit q1000.startswith(r100)
100000 loops, best of 3: 2.45 us per loop

The timings for Path_v4:

sage: %timeit p5*q5
1000000 loops, best of 3: 730 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 762 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 806 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 849 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 854 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 931 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 930 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.97 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.45 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 136 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 148 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 159 ns per loop
sage: %timeit hash(p100)
10000000 loops, best of 3: 182 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 600 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 700 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.65 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.48 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 4.21 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 36.4 us per loop
sage: %timeit p5==r5
100000 loops, best of 3: 706 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 764 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 740 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 742 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 822 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 693 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 1.6 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 2.54 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 4.27 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 37.1 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 255 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 243 ns per loop

comment:9 Changed 4 years ago by SimonKing

Hmmm. Perhaps I should have a look at sage/misc/bitset.pxi.

comment:10 Changed 4 years ago by SimonKing

I couldn't directly use sage/misc/bitset, but I could learn from it: I have improved the Path_v4 implementation (here, the arrows will be encoded by chunks of memory whose size is a power of 2). This was possible by using memcmp in some place (not in the method startswith(), where a loop over the data is faster than memcmp) and, in particular, use bitshift operations to replace multiplications and integer divisions (this is only possible with Path_v4, since here the factors are powers of 2 and can thus correspond to shifts).

New timings with Path=Path_v4 and SizeManager.set_edge_number(27):

sage: %timeit p5*q5
1000000 loops, best of 3: 684 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 726 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 798 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 862 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 827 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 924 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 915 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.91 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.42 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 148 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 158 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 171 ns per loop
sage: %timeit hash(p100)
10000000 loops, best of 3: 194 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 594 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 498 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 547 ns per loop
sage: %timeit p50==s50
1000000 loops, best of 3: 631 ns per loop
sage: %timeit p100==s100
1000000 loops, best of 3: 712 ns per loop
sage: %timeit p1000==s1000
100000 loops, best of 3: 2.42 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 499 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 514 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 549 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 526 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 545 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 506 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 544 ns per loop
sage: %timeit q50==r50
1000000 loops, best of 3: 589 ns per loop
sage: %timeit q100==r100
1000000 loops, best of 3: 704 ns per loop
sage: %timeit q1000==r1000
100000 loops, best of 3: 2.48 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 263 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 232 ns per loop

comment:11 Changed 4 years ago by SimonKing

PS: I chose 27 edges, since this should be particularly bad for Path_v4. Indeed, in Path_v2, 12 arrows can be put into one uint64_t, whereas in Path_v4 only 8 arrows fit into one uint64_t. So, if Path_v4 turns out to be faster than Path_v2 with 27 edges, then it is likely to be generally faster, I think.

comment:12 Changed 4 years ago by SimonKing

From my side, this is the last part of the proof-of-concept implementations, before assessing the benchmark tests and choosing an implementation for the elements of path semigroups...

In Path_v5, I am using the GMP library directly, thus avoiding some overhead compared with Path_v3. The timings I get (with Path=Path_v5 and SizeManager.set_edge_number(27)):

sage: %timeit p5*q5
1000000 loops, best of 3: 720 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 814 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 823 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 808 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 828 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 890 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 866 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.18 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.38 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 159 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 171 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 198 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 266 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 1.93 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 466 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 509 ns per loop
sage: %timeit p50==s50
1000000 loops, best of 3: 495 ns per loop
sage: %timeit p100==s100
1000000 loops, best of 3: 485 ns per loop
sage: %timeit p1000==s1000
1000000 loops, best of 3: 693 ns per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 451 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 468 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 463 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 468 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 482 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 448 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 462 ns per loop
sage: %timeit q50==r50
1000000 loops, best of 3: 448 ns per loop
sage: %timeit q100==r100
1000000 loops, best of 3: 457 ns per loop
sage: %timeit q1000==r1000
1000000 loops, best of 3: 470 ns per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 673 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 657 ns per loop

comment:13 follow-up: Changed 4 years ago by ncohen

Hmmmmmmm... Well, looks like the last one is the best, even though it is a bit slow for hashing. If the sizeof a block is a power of two it should be quite good, don't you think ?

This being said, perhaps it should be implemented like the bitset stuff, in a algebraic-free file, to store sequences of integers as you said.

How do you define your hashing function and your comparison function, by the way ?

Nathann

comment:14 in reply to: ↑ 13 Changed 4 years ago by SimonKing

Replying to ncohen:

Hmmmmmmm... Well, looks like the last one is the best, even though it is a bit slow for hashing.

What do you mean by "the last one"? Using GMP's mpz_tdirectly, which is Path_v5 in the attachment? Or the improved "semicompressed" version of uint64_t* (storing chunks of size 2^n rather then in the smallest possible chunk size), which is Path_v4?

If the sizeof a block is a power of two it should be quite good, don't you think ?

Should be. My own summary of the timings above would be this. We have 6 benchmarks, and in the following list I am giving for each benchmark a small, a medium and a large example. In the end, + or - indicate whether the respective implementation did well/poorly in the three examples.

  1. Concatenation
                         5x5   50x50  1000x1000
    Uncompressed long : 739 ns  885 ns 3.06 us  - -
    Uncompressed short: 700 ns  808 ns 2.13 us
    Uncompressed char : 679 ns  772 ns 1.31 us    +
    Compressed   long : 676 ns  759 ns 1.61 us
    Compressed   short: 718 ns  722 ns 1.98 us   +
    Semicompr.   long : 684 ns  827 ns 1.42 us
    Integer           : 645 ns 1.82 us 1.78 us  +-
    GMP          mpz_t: 723 ns  799 ns 1.41 us
    
  1. Hash
                         5      100     1000
    Uncompressed long : 135 ns 333 ns  2.09 us  +
    Uncompressed short: 138 ns 383 ns  2.61 us  +--
    Uncompressed char : 133 ns 374 ns  2.53 us  +
    Compressed   long : 132 ns 155 ns   466 ns  +++
    Compressed   short: 132 ns 206 ns   951 ns  +
    Semicompr.   long : 148 ns 194 ns   594 ns
    Integer           : 164 ns 274 ns  1.94 us  -
    GMP          mpz_t: 159 ns 266 ns  1.93 us
    
  1. == for equal paths
                         5       100     1000
    Uncompressed long : 1.63 us 23   us  212   us ---
    Uncompressed short: 1.08 us 14.9 us  137   us
    Uncompressed char : 1.08 us 12.5 us  121   us
    Compressed   long :  710 ns  4.4 us   38   us
    Compressed   short:  723 ns  5.3 us   47.5 us
    Semicompr.   long :  498 ns  712 ns    2.4 us
    Integer           :  667 ns  658 ns    817 ns
    GMP          mpz_t:  466 ns  485 ns    693 ns +++
    
  1. == for "easily" unequal paths
                         5       100     1000
    Uncompressed long : 634 ns   686 ns   698 ns
    Uncompressed short: 613 ns   603 ns   635 ns
    Uncompressed char : 581 ns   599 ns   636 ns
    Compressed   long : 692 ns   729 ns   752 ns  --
    Compressed   short: 577 ns   588 ns   607 ns
    Semicompr.   long : 499 ns   526 ns   545 ns
    Integer           : 746 ns   731 ns   733 ns --
    GMP          mpz_t: 451 ns   468 ns   482 ns +++
    
  1. == for "difficult" unequal paths
                         5       100      1000
    Uncompressed long : 1.65 us   22 us    213 us   ---
    Uncompressed short: 1.15 us   12.8 us  140 us
    Uncompressed char : 1.06 us   12.7 us  121 us
    Compressed   long :  706 ns    4.4 us   37.6 us
    Compressed   short:  754 ns    5.3 us   47.3 us
    Semicompr.   long :  506 ns    704 ns   2.5 us
    Integer           :  710 ns    719 ns   755 ns
    GMP          mpz_t:  448 ns    457 ns   470 ns  +++
    
  1. startswith (here we only have one example size, one where the answer is True, and one where the answer is False)
                         yes      no
    Uncompressed long :  325 ns  311 ns
    Uncompressed short:  336 ns  325 ns
    Uncompressed char :  324 ns  315 ns
    Compressed   long :  272 ns  257 ns  +
    Compressed   short:  272 ns  256 ns  +
    Semicompr.   long :  263 ns  232 ns  ++
    Integer           : 2.46 us 2.45 us  --
    GMP          mpz_t:  673 ns  657 ns
    

So, I don't think there is a clear winner in the competition.

This being said, perhaps it should be implemented like the bitset stuff, in a algebraic-free file, to store sequences of integers as you said.

Yes, it would make sense.

How do you define your hashing function and your comparison function, by the way ?

Depends on the implementation. Of course, when I use Integer or mpz_t, I am using the hash function provided by GMP. When I implement a block*, where block is either char, uint32_t or uint64_t, I am using some hash function for lists that I found in the internet a while ago (I don't recall where). It is as follows:

    def __hash__(self):
        cdef block h
        cdef block* ptr
        h = <block>(self.start^(self.end<<16)^(self._len))
        ptr = <block *>self.data
        cdef size_t i
        h ^= FNV_offset
        for i from 0<=i<self._len:
            h ^= ptr[i]
            h *= FNV_prime
        if h==-1:
            return -2
        return h

where FNV_prime and FNV_offset depend on the architecture. I am surprised that this is faster than GMP's hash function.

comment:15 Changed 4 years ago by SimonKing

  • Description modified (diff)
  • Summary changed from Improve efficiency of the path semigroup to Implement sequences of bounded integers

comment:16 Changed 4 years ago by SimonKing

  • Keywords sequence bounded integer added; quiver path algebra efficiency removed

As Nathann has mentioned, it would make sense to provide a general implementation of sequences of bounded integers (independent of quiver paths), and then quiver paths can inherit from it later. That's why I changed the ticket description.

comment:17 Changed 4 years ago by SimonKing

  • Dependencies #12630 deleted

Changed 4 years ago by SimonKing

A proof of concept

comment:18 Changed 4 years ago by SimonKing

For completeness, I have added Path_v6, which uses Python's tuple under the hood (which is an obvious way to implement paths, too).

Result:

sage: %timeit p5*q5
1000000 loops, best of 3: 439 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 542 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 612 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 869 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 1.1 us per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 1.39 us per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 1.41 us per loop
sage: %timeit p100*q1000
100000 loops, best of 3: 5.04 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 11 us per loop
sage: %timeit hash(p5)
1000000 loops, best of 3: 245 ns per loop
sage: %timeit hash(p25)
1000000 loops, best of 3: 532 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 884 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 1.63 us per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 14.6 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 885 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.71 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.63 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 4.53 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 39.3 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 785 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 829 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 799 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 791 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 822 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.48 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 3.72 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 6.63 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 12.5 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 116 us per loop
sage: %timeit q1000.startswith(q100)
100000 loops, best of 3: 4.7 us per loop
sage: %timeit q1000.startswith(r100)
100000 loops, best of 3: 4.72 us per loop

The comparison with the other implementations becomes this:

  1. Concatenation
                         5x5   50x50  1000x1000
    Uncompressed long : 739 ns  885 ns 3.06 us  - -
    Uncompressed short: 700 ns  808 ns 2.13 us
    Uncompressed char : 679 ns  772 ns 1.31 us    +
    Compressed   long : 676 ns  759 ns 1.61 us
    Compressed   short: 718 ns  722 ns 1.98 us   +
    Semicompr.   long : 684 ns  827 ns 1.42 us
    Integer           : 645 ns 1.82 us 1.78 us   -
    GMP          mpz_t: 723 ns  799 ns 1.41 us
    tuple             : 439 ns  1.1 us 11   us  + -
    
  1. Hash
                         5       100     1000
    Uncompressed long : 135 ns  333 ns  2.09 us  +
    Uncompressed short: 138 ns  383 ns  2.61 us  +--
    Uncompressed char : 133 ns  374 ns  2.53 us  +
    Compressed   long : 132 ns  155 ns   466 ns  +++
    Compressed   short: 132 ns  206 ns   951 ns  +
    Semicompr.   long : 148 ns  194 ns   594 ns
    Integer           : 164 ns  274 ns  1.94 us  
    GMP          mpz_t: 159 ns  266 ns  1.93 us
    tuple             : 245 ns 1.63 us 14.6  us  ---
    
  1. == for equal paths
                         5       100     1000
    Uncompressed long : 1.63 us 23   us  212   us ---
    Uncompressed short: 1.08 us 14.9 us  137   us
    Uncompressed char : 1.08 us 12.5 us  121   us
    Compressed   long :  710 ns  4.4 us   38   us
    Compressed   short:  723 ns  5.3 us   47.5 us
    Semicompr.   long :  498 ns  712 ns    2.4 us
    Integer           :  667 ns  658 ns    817 ns
    GMP          mpz_t:  466 ns  485 ns    693 ns +++
    tuple             :  885 ns  4.5 us   39.3 us
    
  1. == for "easily" unequal paths
                         5       100     1000
    Uncompressed long : 634 ns   686 ns   698 ns
    Uncompressed short: 613 ns   603 ns   635 ns
    Uncompressed char : 581 ns   599 ns   636 ns
    Compressed   long : 692 ns   729 ns   752 ns 
    Compressed   short: 577 ns   588 ns   607 ns
    Semicompr.   long : 499 ns   526 ns   545 ns
    Integer           : 746 ns   731 ns   733 ns 
    GMP          mpz_t: 451 ns   468 ns   482 ns +++
    tuple             : 785 ns   791 ns   822 ns ---
    
  1. == for "difficult" unequal paths
                         5       100      1000
    Uncompressed long : 1.65 us   22   us  213 us   ---
    Uncompressed short: 1.15 us   12.8 us  140 us
    Uncompressed char : 1.06 us   12.7 us  121 us
    Compressed   long :  706 ns    4.4 us   37.6 us
    Compressed   short:  754 ns    5.3 us   47.3 us
    Semicompr.   long :  506 ns    704 ns   2.5 us
    Integer           :  710 ns    719 ns   755 ns
    GMP          mpz_t:  448 ns    457 ns   470 ns  +++
    tuple             : 1.48 us   12.5 us  116 us
    
  1. startswith
                         yes      no
    Uncompressed long :  325 ns  311 ns
    Uncompressed short:  336 ns  325 ns
    Uncompressed char :  324 ns  315 ns
    Compressed   long :  272 ns  257 ns  +
    Compressed   short:  272 ns  256 ns  +
    Semicompr.   long :  263 ns  232 ns  ++
    Integer           : 2.46 us 2.45 us  
    GMP          mpz_t:  673 ns  657 ns
    tuple             : 4.7  us 4.72 us  --
    

So, using tuples is best for concatenation of short paths, but that's the only case where it is not too slow.

comment:19 follow-up: Changed 4 years ago by ncohen

Well, given the outcomes I guess it mostly depends on your own practical applications ?... :-)

I also wonder (because I hate them) how much time is spent dealing with parents. But all your implementations have to go through that at the moment so it does not change your comparisons.

Well, I am back to work at long long last. With a computer AND a screen. And in Brussels, which counts for the good mood :-P

Have fuuuuuuuuuuuuuun !

Nathann

comment:20 in reply to: ↑ 19 Changed 4 years ago by SimonKing

Hi Nathann,

Replying to ncohen:

I also wonder (because I hate them) how much time is spent dealing with parents.

Well, at some point parents have to come into play, namely when I want to implement not just sequences of bounded integers, but elements of the path semigroup of a quiver.

The strategical question is: Should the data defining a path be stored in an attribute of that path (the attribute would be an instance of BoundedIntegerSequence), or should the path itself be an instance of BoundedIntegerSequence? In the former case, it would be perfectly fine to have sequences of bounded integers parent-less; but I worry about an overhead of accessing the attribute. In the latter case, BoundedIntegerSequence should already be a sub-class of sage.structure.element.Element (and have a parent), since Cython does not allow multiple inheritance.

I tend to the latter solution also for a different reason: Instances of BoundedIntegerSequence have to know the bound for their elements, because the data storage depends on that bound. For efficiency (see attachment), several other constants are derived from that bound, and used in the algorithms.

Should each instance of BoundedIntegerSequence know about all these bounds? Or should there be a common parent for all sequences of integers that are bounded by a number B?

I tend to the latter. If you assign several ints (the afore-mentioned constants) to an instance of BoundedIntegerSequence, it will likely be more time consuming (during initialisation) than to store just a single data (namely the parent).

That said, looking up the constants via the parent would be more time-consuming than looking up constants that are stored directly in the element.

Difficult....

comment:21 follow-up: Changed 4 years ago by ncohen

Hellooooo Simon !

Hmmmm... All this is important to settle before implementing those BoundedIntegerSequences? objects somewhere indeed.

What I thought you intended was to write a very low-level data structure somewhere in the misc/ folder. The kind of stuff that one does not use by mistake, the kind of stuff that ones does not use without reading the manual first. So I thought it would not exactly check its input, trusting blindly the developer and being as efficient as possible.

Then, I thought you would have a higher-level algebraic stuff with parents colors and blinking lights that would check their input indeed, and be what we want a user-friendly object to be.

But I actually have only one question: I don't know how you intend to use all this in the end, but I got the impression that several functions may have to generate a LOT of paths in order to compute some small data, and throw all the paths away once the computations have ended. Do you want do create high-level objects in those functions or not ?

If you don't, then perhaps you would be better with only low-level objects that you know how to use, which would not have to waste time (if time is actually wasted) on high-level problems.

And so I thought that you high-level objects would have a variable which would be this low-level object, etc, etc...

What do you have in mind ?

Nathann

comment:22 in reply to: ↑ 21 ; follow-up: Changed 4 years ago by SimonKing

Hi Nathann,

Replying to ncohen:

What I thought you intended was to write a very low-level data structure somewhere in the misc/ folder.

I somehow do.

The kind of stuff that one does not use by mistake,

Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).

Then, I thought you would have a higher-level algebraic stuff with parents colors and blinking lights that would check their input indeed, and be what we want a user-friendly object to be.

Again ideally: Blinking colours should only be added when one really implements paths in quivers.

But I actually have only one question: I don't know how you intend to use all this in the end, but I got the impression that several functions may have to generate a LOT of paths in order to compute some small data, and throw all the paths away once the computations have ended. Do you want do create high-level objects in those functions or not ?

Good point. So, you mean that these functions should actually not create fully-fledged quiver paths, but should just operate on instances of BoundedIntegerSequence, and only in the end interpret them as paths, when the battle smoke has vanished.

And so I thought that you high-level objects would have a variable which would be this low-level object, etc, etc...

Yes, probably that's better.

So, perhaps I should really do cdef class BoundedIntegerSequence(tuple) rather than cdef class BoundedIntegerSequence(MonoidElement).

But then, what to do with all these internally used constants that are needed even for the low-level functionality of BoundedIntegerSequence? Would you prefer to compute and store these constants 10000 times when creating 10000 sequences of integers bounded by B? Or would you prefer to compute and store these constants only once, namely in a parent that hosts all sequences of integers with a fixed bound?

Or do you have a third solution for storing/accessing these constants efficiently?

comment:23 in reply to: ↑ 22 ; follow-up: Changed 4 years ago by ncohen

Helloooooo !!

Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).

Why do you want it to inherit from tuple ? I do not know the effect of such a thing, what it brings and what should be feared

Good point. So, you mean that these functions should actually not create fully-fledged quiver paths, but should just operate on instances of BoundedIntegerSequence, and only in the end interpret them as paths, when the battle smoke has vanished.

Well, yep. Only create the high-level stuff with possible overhead when you need it. And you end up knowing exactly what your code does when you are writing the code of your time-critical function.

So, perhaps I should really do cdef class BoundedIntegerSequence(tuple) rather than cdef class BoundedIntegerSequence(MonoidElement).

Hmmm.. I still don't get why you want it to inherit from tuple. My natural move would be to implement this as a C structure, not even a object :-P

But then, what to do with all these internally used constants that are needed even for the low-level functionality of BoundedIntegerSequence? Would you prefer to compute and store these constants 10000 times when creating 10000 sequences of integers bounded by B? Or would you prefer to compute and store these constants only once, namely in a parent that hosts all sequences of integers with a fixed bound?

Well, somehow that's already what we do with graph backends. We have a Generic Backend, extended twice in Dense Graphs and Sparse Graphs. And all the constants needed by the data structures are stored in the corresponding files. You could have a BoundedIntegerSequence class implementing no function at all, extended by alld ifferent implementations of the data structures you want. Even in the same file.

... And we don't need parents for that :-PPPP

Nathann

comment:24 in reply to: ↑ 23 ; follow-up: Changed 4 years ago by SimonKing

Replying to ncohen:

Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).

Why do you want it to inherit from tuple ?

By analogy to Sequence_generic (which inherits from list):

sage: S = Sequence([1,2,3])
sage: type(S)
<class 'sage.structure.sequence.Sequence_generic'>
sage: type(S).mro()
[sage.structure.sequence.Sequence_generic,
 sage.structure.sage_object.SageObject,
 list,
 object]

Well, somehow that's already what we do with graph backends. We have a Generic Backend, extended twice in Dense Graphs and Sparse Graphs. And all the constants needed by the data structures are stored in the corresponding files.

No, that's a different situation. If you have a sequence S1 of integers bounded by B1 and a sequence S2 of integers bounded by B2 then the constants for S1 are different from the constants of S2. So, it is not really a "global" constant that is shared by all instances of a certain data structure (and could thus be put into a file), but when you do, for example, for x in B1 then you'll need different constants than when you do for x in B2.

So, these constants are not shared by all sequences, but by all sequence that share the same bound. Thus, it seems natural (to me, at least) to have one object O(B) that is shared by all sequences of bound B, so that O(B) provides the constants that belong to B.

Sure, O(B) could be written in C as well. But why not making it a parent, when all what we do is letting each sequence have a pointer to O(B)?

comment:25 in reply to: ↑ 24 ; follow-up: Changed 4 years ago by ncohen

By analogy to Sequence_generic (which inherits from list):

?...

Well. And why is that a good idea ? :-P

No, that's a different situation. If you have a sequence S1 of integers bounded by B1 and a sequence S2 of integers bounded by B2 then the constants for S1 are different from the constants of S2. So, it is not really a "global" constant that is shared by all instances of a certain data structure (and could thus be put into a file), but when you do, for example, for x in B1 then you'll need different constants than when you do for x in B2.

Hmmmm... I see. Well, isn't your only parameter the number of bits you need to store each entry ? If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?

If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something :-)

I'd say.... 4,8,16,32 ? :-)

So, these constants are not shared by all sequences, but by all sequence that share the same bound. Thus, it seems natural (to me, at least) to have one object O(B) that is shared by all sequences of bound B, so that O(B) provides the constants that belong to B.

If your data structure is not an object, you could also add this width parameter as an input of the functions you use ? I guess all Bounded Sequence Integer that you use in the same function would have the same bound.

But admittedly I love to split hairs. Tell me if you think that this is going too far :-P

Sure, O(B) could be written in C as well. But why not making it a parent, when all what we do is letting each sequence have a pointer to O(B)?

Ahahahah. That's up to you. I just hate Sage's parents/elements. I would live happy with a C pointer I can control, but not witha framework that I do not understand. If you understand it, then you know what you are doing. I don't :-P

Nathann

comment:26 in reply to: ↑ 25 ; follow-ups: Changed 4 years ago by SimonKing

Replying to ncohen:

Hmmmm... I see. Well, isn't your only parameter the number of bits you need to store each entry ?

No. We have

cdef Integer NumberArrows
cdef Integer IntegerMask
cdef size_t ArrowBitSize, BigArrowBitSize
cdef block* bitmask = NULL
cdef block DataMask, BitMask
cdef size_t ArrowsPerBlock, BigArrowsPerBlock
cdef size_t ModBAPB, DivBAPB, TimesBABS, TimesBlockSize

For example, ModBAPB is used to replace n%BigArrowsPerBlock by a bit-wise and, which is noticeably faster.

If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?

Depending on the implementation...

If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something :-)

Isn't duplication of code supposed to smell?

If your data structure is not an object, you could also add this width parameter as an input of the functions you use ? I guess all Bounded Sequence Integer that you use in the same function would have the same bound.

Do I understand correctly: You suggest that we should not provide a class BoundedIntegerSequence that can be used interactively in Sage, but you suggest that we should just provide a couple of functions that operate on C structures (say, mpz_t or usigned long*) and can only be used in Cython code?

Then it would be better to revert this ticket to its original purpose: There would be...

  1. ... cdef functions operating on mpz_t resp. on unsigned long*,
  2. ... a Cython class Path using mpz_t resp. unsigned long* as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,
  3. ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with Path but with mpz_t* resp unsigned long** (which will be the case anyway).

But admittedly I love to split hairs. Tell me if you think that this is going too far :-P

No problem. I just want to know if people see the need to have a Cython implementation of sequences of bounded integers, for use in Python.

Ahahahah. That's up to you. I just hate Sage's parents/elements. I would live happy with a C pointer I can control, but not witha framework that I do not understand.

Making it something like a container rather than a fully fledged parent would be an option, from my perspective.

comment:27 in reply to: ↑ 26 Changed 4 years ago by SimonKing

Replying to SimonKing:

Making it something like a container rather than a fully fledged parent would be an option, from my perspective.

... I mean, for BoundedIntegerSequence. Of course, Path must have a parent (the path semigroup of a quiver).

comment:28 Changed 4 years ago by nthiery

Just a random thought. Some issues here are of the same nature as what was discussed around ClonableElement? and its subclasses. Maybe there is stuff to share with it, especially if one ends up inheriting from Element.

comment:29 Changed 4 years ago by SimonKing

I just had a look at ClonableElement, and one immediate thought is: Why are C-pointers allocated in __init__? That's clearly the purpose of __cinit__, in particular if said C-pointers are freed in __dealloc__.

Do we want to have sequences of bounded integers for use in Python? Then one possible way would be:

  • have some inline cdef functions that operate on certain (templated?) C data structures
  • use these in a sub-class of ClonableElement (so that we can have sequences of bounded integers used in python)
  • use these in a Path class, to be used in path semigroups.

comment:30 in reply to: ↑ 26 ; follow-up: Changed 4 years ago by ncohen

Yooooooo !!!

No. We have

cdef Integer NumberArrows
cdef Integer IntegerMask
cdef size_t ArrowBitSize, BigArrowBitSize
cdef block* bitmask = NULL
cdef block DataMask, BitMask
cdef size_t ArrowsPerBlock, BigArrowsPerBlock
cdef size_t ModBAPB, DivBAPB, TimesBABS, TimesBlockSize

Hmmmm.. Can't they all be deduced from the type of implementation (compressed long, for instance) and the number of bits per entry then ?

For example, ModBAPB is used to replace n%BigArrowsPerBlock by a bit-wise and, which is noticeably faster.

Yepyep of course.

If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?

Depending on the implementation...

Okay....

If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something :-)

Isn't duplication of code supposed to smell?

Well, if it is a template the code is only implemented once. And it will generate several data structures at compile time indeed, but each of them will have a hardcoded constant, and you only implement one.

Do I understand correctly: You suggest that we should not provide a class BoundedIntegerSequence that can be used interactively in Sage, but you suggest that we should just provide a couple of functions that operate on C structures (say, mpz_t or usigned long*) and can only be used in Cython code?

That's what I had in mind, but you make decisions here, pick what you like.

It's a bit how bitsets are implemented. A C data structure, and an object wrapper on top of it. Or C Graphs, with a data structure on top of it. It is how Permutations should be implemented, too :-P

Well, you have a C object that you use when you want performance, and when you don't care you use the higher-level implementations.

Then it would be better to revert this ticket to its original purpose: There would be...

  1. ... cdef functions operating on mpz_t resp. on unsigned long*,
  2. ... a Cython class Path using mpz_t resp. unsigned long* as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,
  3. ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with Path but with mpz_t* resp unsigned long** (which will be the case anyway).

Well, I like this plan. What do you think ? Does it displease you in any way ?

No problem. I just want to know if people see the need to have a Cython implementation of sequences of bounded integers, for use in Python.

I do not see how I could use it in my code at the moment, but I like the idea of having such a data structure somewhere, ready to be used. Most importantly, if you want speed, I think that having it available at two different levels is a safe bet. You write more code, but you will know how computations are spent in your time-critical functions.

Nathann

comment:31 in reply to: ↑ 30 ; follow-up: Changed 4 years ago by SimonKing

Hi Nathann,

Replying to ncohen:

Hmmmm.. Can't they all be deduced from the type of implementation (compressed long, for instance) and the number of bits per entry then ?

Sure. And you want to do this same computation 10^7 times?

Isn't duplication of code supposed to smell?

Well, if it is a template the code is only implemented once. And it will generate several data structures at compile time indeed, but each of them will have a hardcoded constant, and you only implement one.

Yes. Probably my problem is that I don't think "C++", but I think "Cython". And there it is a bit clumsy (I think templating is mimicked in Cython by including .pxi files, isn't it?).

That's what I had in mind, but you make decisions here, pick what you like.

It's a bit how bitsets are implemented.

Right, that's what I have looked at, and think makes sense here. Up to the constants.

Then it would be better to revert this ticket to its original purpose: There would be...

  1. ... cdef functions operating on mpz_t resp. on unsigned long*,
  2. ... a Cython class Path using mpz_t resp. unsigned long* as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,
  3. ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with Path but with mpz_t* resp unsigned long** (which will be the case anyway).

Well, I like this plan. What do you think ? Does it displease you in any way ?

No, it is one possibility.

comment:32 Changed 4 years ago by nthiery

  • Cc hivert added

comment:33 in reply to: ↑ 31 ; follow-up: Changed 4 years ago by ncohen

Sure. And you want to do this same computation 10^7 times?

I don't get it. If the width is a power of two, then there are 6 possible values, i.e. 21, 22, 23, 24, 25, 26. Otherwise you have at most 64, but really who cares in the end ?

Yes. Probably my problem is that I don't think "C++", but I think "Cython". And there it is a bit clumsy (I think templating is mimicked in Cython by including .pxi files, isn't it?).

I am thinking of this : http://stackoverflow.com/questions/4840819/c-template-specialization-with-constant-value

That's C++, and I don't see how to do it in Cython, though. The point is that you can parameter a data structure with a constant value, and the data structure will be compiled for each value of the constant *that appears in the code at compile-time* (not all possible values, or course). The point is that the code is optimized while knowing the value(which can lead to optimization) and also that you can do things that would not be possible otherwise, like "cdef int a[k]" where k is a (templated) variable.

Nathann

comment:34 in reply to: ↑ 33 ; follow-up: Changed 4 years ago by SimonKing

Replying to ncohen:

Sure. And you want to do this same computation 10^7 times?

I don't get it. If the width is a power of two, then there are 6 possible values, i.e. 21, 22, 23, 24, 25, 26. Otherwise you have at most 64, but really who cares in the end ?

Apparently I am not used to templated code...

I was somehow thinking that you suggested to keep the bitlength of the upper bound for the integers as the only parameter, which means that you would need to compute all the other things either for each operation or (when storing it as attribute of the sequences) for each sequence.

I am thinking of this : http://stackoverflow.com/questions/4840819/c-template-specialization-with-constant-value

That's C++, and I don't see how to do it in Cython, though.

Part of the question is: Do we want to use gmp as backend? Or do we want to (re-)implement operations on densely packed long* by ourselves? Given the timings, gmp is an option.

Anyway, I'll have a look at the stackoverflow discussion. Thank you for the pointer!

The next question then is how to access the code from Cython. I've done this with C, but not C++.

comment:35 in reply to: ↑ 34 ; follow-up: Changed 4 years ago by ncohen

Apparently I am not used to templated code...

I was somehow thinking that you suggested to keep the bitlength of the upper bound for the integers as the only parameter, which means that you would need to compute all the other things either for each operation or (when storing it as attribute of the sequences) for each sequence.

Nonono. Somehow, you can think of templates as a way to "script" the generation of code. You write some code with a constant k which is never defined anywhere, and then you say "compile one version of this code for each k in 1, 2, 3, 4, 5, 6. You end up with several data structures implemented, each with a hardcoded value. And then you can create an object cdef LongCompressedPath<5> my_path in which the hardcoded bound is 2^5. You just have many additional types, each with its own constant.

Part of the question is: Do we want to use gmp as backend? Or do we want to (re-)implement operations on densely packed long* by ourselves? Given the timings, gmp is an option.

Yep, you are right. Depends on how you weight the operations that you will need: it looks okay for concatenations and bad for startwith, but I expect that the most time consuming operation will be concatenation ? You are the one who needs it :-)

Anyway, I'll have a look at the stackoverflow discussion. Thank you for the pointer!

The next question then is how to access the code from Cython. I've done this with C, but not C++.

HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?

Nathann

comment:36 Changed 3 years ago by vbraun_spam

  • Milestone changed from sage-6.2 to sage-6.3

comment:37 in reply to: ↑ 35 ; follow-up: Changed 3 years ago by SimonKing

Hi Nathann,

sorry for the long silence!

Replying to ncohen:

HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?

Did you find out how templated code can be written in Cython? If you have asked on cython-devel (so far I have only posted on cython-users) or have another pointer, then please tell me. If not, tell me also...

Cheers, Simon

comment:38 Changed 3 years ago by SimonKing

Hi Nathann,

wouldn't the following be close to templates?

In Cython:

cimport cython

ctypedef fused str_or_int:
    str
    int

cpdef str_or_int times_two(str_or_int var):
    return var+var

def show_me():
    cdef str a = 'a'
    cdef int b = 3
    print 'str', times_two(a)
    print 'int', times_two(b)

and this results in

sage: show_me()
str aa
int 6

comment:39 Changed 3 years ago by SimonKing

Or perhaps a better example, showing that the fused type is used to choose code at compile time: Put the following into a cell of the notebook

%cython
cimport cython

ctypedef fused str_or_int:
    str
    int
cpdef str_or_int times_two(str_or_int var):
    if str_or_int is int:
        return var*2
    else:
        return var+var
def test():
    cdef int a = 3
    cdef str b = 'b'
    a = times_two(a)
    b = times_two(b)

and look at the resulting code. The line a = times_two(a) becomes

 __pyx_v_a = __pyx_fuse_1__pyx_f_68_home_king__sage_sage_notebook_sagenb_home_admin_2_code_sage4_spyx_0_times_two(__pyx_v_a, 0);

whereas b = times_two(b) becomes

__pyx_t_1 = __pyx_fuse_0__pyx_f_68_home_king__sage_sage_notebook_sagenb_home_admin_2_code_sage4_spyx_0_times_two(__pyx_v_b, 0);

and one can check that Cython created two different C-functions __pyx_fuse_0... and __pyx_fuse_1... for the two possible (type-dependent!) versions of times_two.

comment:40 follow-up: Changed 3 years ago by SimonKing

And here an example showing that it pays off, speed-wise:

In a .pyx file:

cimport cython

ctypedef fused str_or_int:
    str
    int

cpdef str_or_int times_two(str_or_int var):
    if str_or_int is int:
        return var+var
    else:
        return var+var

cpdef untyped_times_two(var):
    return var+var

cpdef int int_times_two(int var):
    return var+var

cpdef str str_times_two(str var):
    return var+var

def test1():
    cdef int a = 3
    cdef str b = 'b'
    a = times_two(a)
    b = times_two(b)

def test2():
    cdef int a = 3
    cdef str b = 'b'
    a = untyped_times_two(a)
    b = untyped_times_two(b)

def test3():
    cdef int a = 3
    cdef str b = 'b'
    a = int_times_two(a)
    b = str_times_two(b)

Attach the .pyx file to Sage, and get this:

age: %attach /home/king/Sage/work/templates/test.pyx
Compiling /home/king/Sage/work/templates/test.pyx...
sage: %timeit test1()
10000000 loops, best of 3: 159 ns per loop
sage: %timeit test2()
1000000 loops, best of 3: 199 ns per loop
sage: %timeit test3()
10000000 loops, best of 3: 168 ns per loop

EDIT: I have replaced the original example by something fairer, where it says var+var everywhere, but in the templated version Cython can choose the c-implementation of var+var at compile time, whereas it can't (and hence loses time when running the code) in the untemplated version.

EDIT 2: The new example also shows that choosing a specialised implementation manually is not faster than Cython using templates.

Last edited 3 years ago by SimonKing (previous) (diff)

comment:41 in reply to: ↑ 40 ; follow-up: Changed 3 years ago by ncohen

Hellooooooooooooooooo !!!

EDIT: I have replaced the original example by something fairer, where it says var+var everywhere, but in the templated version Cython can choose the c-implementation of var+var at compile time, whereas it can't (and hence loses time when running the code) in the untemplated version.

HMmmmmm... Does it make any difference ? gcc should resolve those "if" at compile-time, so if Cython creates two version of the function with an "if" inside depending on the type, gcc should not include it in the final binary as the constants involved can be defined at compile-time.

Anyway, that's a good news ! :-D

Nathann

comment:42 in reply to: ↑ 37 Changed 3 years ago by ncohen

Yooooooooooo !!

HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?

Did you find out how templated code can be written in Cython? If you have asked on cython-devel (so far I have only posted on cython-users) or have another pointer, then please tell me. If not, tell me also...

Ahahah.. Well, no ... When I ask a question I do not do anything before I get an answer, soooooooooo I never did it :-P

I ask a question, then forget everything about it. When I get the answer I do what should be done :-P

Nathann

comment:43 in reply to: ↑ 41 ; follow-up: Changed 3 years ago by SimonKing

Hi Nathann,

Replying to ncohen:

HMmmmmm... Does it make any difference ? gcc should resolve those "if" at compile-time, so if Cython creates two version of the function with an "if" inside depending on the type, gcc should not include it in the final binary as the constants involved can be defined at compile-time.

Yes, it is indeed resolved at compile time. Two c-functions are created, and only the part of the code relevant for the respective code branch appears in the c-function.

In other words, I now plan to create cdef functions like this:

ctypedef seq_t:
    first_implementation
    second_implementation

cdef seq_t concat(seq_t x, seq_t y):
    if seq_t is first_implementation:
        <do stuff for 1st implementation>
    else:
        <do stuff for 2nd implementation>

and then, in other modules (perhaps also in src/sage/combinat/words/word_datatypes.pyx?), one could uniformly call concat(a,b), provided that the type of a and b are known at compile time (or write concat[first_implementation](a,b) explicitly).

I think that such Cython code will be sufficiently nice to read, and the resulting C-code will be sufficiently fast.

Anyway, that's a good news ! :-D

Indeed!

Simon

comment:44 in reply to: ↑ 43 Changed 3 years ago by ncohen

I think that such Cython code will be sufficiently nice to read, and the resulting C-code will be sufficiently fast.

+1

Nathann

comment:45 Changed 3 years ago by SimonKing

Hmmm. Meanwhile I am a bit less positive about fused types. Let seq_t be a fused type for different implementations.

  • Fused types can not be used in iterators. I.e., the following is refused:
    def iter_seq(seq_t S):
        for <bla>:
            <do something>
            yield x
    
  • Fused types can not be used as attributes of cdef classes. I.e., the following is refused:
    cdef class BoundedIntSeq:
        cdef seq_t seq
        def __mul__(self, other):
            <do type check>
            cdef seq_t out = concat_seq(self.seq, other.seq)
            <create and return new instance of BoundedIntSeq using "out">
    

Hence, it seems that one needs to create two separate iterators, and it also seems that one needs code duplication. I.e., when impl1 and impl2 are two specialisations of the fused type seq_t, then one has to have

cdef class BoundedIntSeq_impl1:
    cdef impl1 seq
    def __mul__(self, other):
        ...
        cdef impl1 out = concat_seq(self.seq, other.seq)
cdef class BoundedIntSeq_impl2:
    cdef impl2 seq
    def __mul__(self, other):
        ...
        cdef impl2 out = concat_seq(self.seq, other.seq)

This mainly amounts to cut-and-paste, but is ugly!

I think I'll ask on cython-users if there are known ideas to solve this more elegantly.

comment:46 Changed 3 years ago by SimonKing

Some variation on the theme: One can create fused types from two cdef classes, and then one can work with specialisations of functions as follows:

ctypedef fused str_or_int:
    str
    int

# make the function do different things on different types,
# to better see the difference
cpdef str_or_int times_two(str_or_int var):
    if str_or_int is int:
        return var+var
    else:
        return var+'+'+var

# forward declaration of cdef classes and their fusion
cdef class Bar_int
cdef class Bar_str
cdef fused Bar:
    Bar_int
    Bar_str

# a function that can be specialised to either of the two extension classes
def twice(Bar self):
    # and inside, we specialised a function depending
    # on the type of an attribute --- at compile time!!
    return type(self)(times_two(self.data))

cdef class Bar_str:
    cdef str data
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return self.data
    # First possibility: Use the "templated" function as a method
    twice_ = twice[Bar_str]
    # second possibility: Wrap the "templated" function
    cpdef Bar_str twice__(self):
        return twice(self)

# The same, with the other Specialisation of <Bar>
cdef class Bar_int:
    cdef int data
    def __init__(self, data):
        self.data = data
    def __repr__(self):
        return "int(%d)"%self.data
    twice_ = twice[Bar_int]
    cpdef Bar_int twice__(self):
        return twice(self)

Time-wise, we see that wrapping the "templated" function is slower than using it as an attribute:

sage: b1 = Bar_str('abc')
sage: b1.twice_()
abc+abc
sage: b1.twice__()
abc+abc
sage: %timeit b1.twice_()
1000000 loops, best of 3: 759 ns per loop
sage: %timeit b1.twice__()
100000 loops, best of 3: 13.7 µs per loop
sage: b2 = Bar_int(5)
sage: b2.twice_()
int(10)
sage: b2.twice__()
int(10)
sage: %timeit b2.twice_()
1000000 loops, best of 3: 539 ns per loop
sage: %timeit b2.twice__()
100000 loops, best of 3: 9.2 µs per loop

Summary and further comments

  • We can reduce the code duplication to something like this:
    <define templated function foo_ depending on a fused type Bar with specialisations Bar1, Bar2,...>
    cdef class Bar1:
        foo = foo_[Bar1]
    cdef class Bar2:
        foo = foo_[Bar2]
    ...
    
    I hope this level of copy-and-paste can be afforded. At least, it gives good speed.
  • In my tests I found the following:
    • It is impossible to make foo_ a cpdef function. Otherwise, the assignment foo = foo_ fails with the statement that foo_ can not be turned into a Python type.
    • It is impossible to chose the same name for the templated function and for the method it is turned into. Doing
      class Bar1:
          foo_ = foo_
      
      results in an error saying that Bar1 has not attribute foo_.
    • With the syntax above, both twice_ and twice__ have been put into the class' __dict__. I did not succeed to turn it into the equivalent of a cdef method.

comment:47 Changed 3 years ago by SimonKing

I did a lot of benchmark tests for the different implementations (using long* or char* or mpz_t to store sequences of bounded integers, using cdef functions that operate on fused types) and eventually found that using GMP (i.e., mpz_t) is fastest in all tests, by a wide margin.

Why did I not find this result before? Well, part of the reason is that GMP sometimes provides different functions to do the same job, and in the past few days I considerably improved my usage of the various possibilities. For example, mpz_t can be initialised in different ways. If one uses mpz_init, then the assignment that will follow results in a re-allocation. Since the length of the concatenation of two tuples is known in advance, prescribing the number of to-be-allocated bits with mpz_init2 resulted in a speed-up of concatenation, actually it became twice as fast.

And this result is not surprising. After all, GMP has to implement the same bitshift and comparison operations for bit arrays that I was implementing in the "long*" approach, too. But GMP probably uses implementation techniques that I can not even dream of.

Now, my plan is to

  • create some ctypedef struct bounded_int_tuple that is based on mpz_t and is the underlying C structure for tuples of bounded integers
  • provide a collection of cdef (inline) functions, operating on bounded_int_tuple, for concatenation, copying, comparison and so on
  • wrap it in a cdef class BoundedIntegerTuple. This will probably not derive from Element: I think it makes no difference from the point of view of memory efficiency whether each BoundedIntegerTuple stores a pointer to the parent of "all tuples of integers bounded by B" or directly stores B as an unsigned int.

In a subsequent ticket, I plan to use it for quiver paths, and if the combinat crowd cares, they can use it to make parts of sage.combinat.words a lot faster.

Comments? Requests? Suggestion of alternatives?

Last edited 3 years ago by SimonKing (previous) (diff)

comment:48 follow-up: Changed 3 years ago by ncohen

None from me. I keep being impressed at how much testing/researching you put into this quiver features... I really wonder what you will do with it in the end ;-)

Nathann

comment:49 Changed 3 years ago by SimonKing

  • Branch set to u/SimonKing/ticket/15820
  • Created changed from 02/14/14 17:31:31 to 02/14/14 17:31:31
  • Modified changed from 05/25/14 15:39:08 to 05/25/14 15:39:08

comment:50 Changed 3 years ago by SimonKing

  • Authors set to Simon King
  • Commit set to 5dc78c5f5f647bf95b98da916d7086fae539ffb8
  • Status changed from new to needs_review

I have pushed a branch with an initial implementation of bounded integer sequences. I took care of error handling: I think you will find that those GMP commands that may result in a memory error are wrapped in sig_on()/sig_off(), and the cdef boilerplate functions do propagate errors. In fact, in some of my tests, I was hitting Ctrl-C, and it worked.

The rest of this post is about benchmarks. I compare against Python tuples---which is of course ambitious, and you will find that not in all situations tuples are slower than bounded integer sequences. However, one should see it positively: The bounded integer sequences implemented here have features very similar to tuples, and there are operations for which they are a lot faster than tuples. It all depends on the length and the bound of the sequence. That's why I think that it is a reasonable contribution. And to make it faster (without Python classes), one can still use the cdef boilerplate functions. Generally, it seems that long bounded integer sequences are faster than long tuples, but short tuples are faster than short sequences. For hash, bounded integer sequences are pretty good, but they suck in accessing items, or in slicing with step different from 1.

In contrast to tuples, bounded integer sequences also provide fairly quick tests for whether a sequences starts with a given sequence, or whether a sequence contains a certain sub-sequence. This is a feature I took from strings.

Here are the tests, with different sizes and bounds. Variable names starting with "T" hold tuples, those starting with "S" hold bounded integer sequences.

sage: from sage.structure.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)

Iteration

sage: timeit("x=[y for y in T0]", number=1000000)
1000000 loops, best of 3: 783 ns per loop
sage: timeit("x=[y for y in T1]", number=1000000)
1000000 loops, best of 3: 1.48 µs per loop
sage: timeit("x=[y for y in T2]", number=100000)
100000 loops, best of 3: 4.3 µs per loop
sage: timeit("x=[y for y in T3]", number=1000)
1000 loops, best of 3: 245 µs per loop
sage: timeit("x=[y for y in S0]", number=1000000)
1000000 loops, best of 3: 2.38 µs per loop
sage: timeit("x=[y for y in S1]", number=100000)
100000 loops, best of 3: 4.1 µs per loop
sage: timeit("x=[y for y in S2]", number=100000)
100000 loops, best of 3: 10.1 µs per loop
sage: timeit("x=[y for y in S3]", number=1000)
1000 loops, best of 3: 1.79 ms per loop

Slicing

Bounded integer sequences are immutable and hence copied by identity. But let us do slicing, dropping the last item:

sage: timeit("x=T3[:-1]", number=100000)
100000 loops, best of 3: 19.5 µs per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 3.48 µs per loop

Slicing with step!=1 is much slower, though:

sage: timeit("x=T3[:-1:2]", number=100000)
100000 loops, best of 3: 11.7 µs per loop
sage: timeit("x=S3[:-1:2]", number=1000)
1000 loops, best of 3: 2.23 ms per loop

Perhaps I should mark it "TODO"...

Accessing single items

Short sequences

sage: timeit("x=T0[1]", number=1000000)
1000000 loops, best of 3: 361 ns per loop
sage: timeit("x=T0[4]", number=1000000)
1000000 loops, best of 3: 373 ns per loop
sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 959 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 960 ns per loop

Large sequences (time also depends on the bound for the integer sequence!)

sage: timeit("x=T3[1]", number=1000000)
1000000 loops, best of 3: 359 ns per loop
sage: timeit("x=T3[4500]", number=1000000)
1000000 loops, best of 3: 382 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 1.97 µs per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 1.49 µs per loop

Comparison

Note that comparison of bounded integer sequences works different from comparison of tuples or lists, as detailed in the documentation.

We compare sequences that are equal but non-identical, that differ in early items, or that differ in late items.

sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: T0x = tuple(L0x); T1x = tuple(L1x); T2x = tuple(L2x); T3x = tuple(L3x)
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: T0y = tuple(L0); T1y = tuple(L1); T2y = tuple(L2); T3y = tuple(L3)
sage: S1y = BoundedIntegerSequence(8, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)

Early differences:

sage: timeit("T0==T0x", number=1000000)
1000000 loops, best of 3: 143 ns per loop
sage: timeit("T1==T1x", number=1000000)
1000000 loops, best of 3: 145 ns per loop
sage: timeit("T2==T2x", number=1000000)
1000000 loops, best of 3: 143 ns per loop
sage: timeit("T3==T3x", number=1000000)
1000000 loops, best of 3: 161 ns per loop
sage: timeit("S0==S0x", number=1000000)
1000000 loops, best of 3: 538 ns per loop
sage: timeit("S1==S1x", number=1000000)
1000000 loops, best of 3: 550 ns per loop
sage: timeit("S2==S2x", number=1000000)
1000000 loops, best of 3: 490 ns per loop
sage: timeit("S3==S3x", number=1000000)
1000000 loops, best of 3: 559 ns per loop

Equal sequences:

sage: timeit("T0==T0y", number=1000000)
1000000 loops, best of 3: 169 ns per loop
sage: timeit("T1==T1y", number=1000000)
1000000 loops, best of 3: 255 ns per loop
sage: timeit("T2==T2y", number=1000000)
1000000 loops, best of 3: 597 ns per loop
sage: timeit("T3==T3y", number=100000)
100000 loops, best of 3: 47.8 µs per loop
sage: timeit("S0==S0y", number=1000000)
1000000 loops, best of 3: 511 ns per loop
sage: timeit("S1==S1y", number=1000000)
1000000 loops, best of 3: 493 ns per loop
sage: timeit("S2==S2y", number=1000000)
1000000 loops, best of 3: 583 ns per loop
sage: timeit("S3==S3y", number=1000000)
1000000 loops, best of 3: 1.41 µs per loop

Late differences:

sage: T0z1 = T0+T0
sage: T0z2 = T0+T0x
sage: T1z1 = T1+T1
sage: T1z2 = T1+T1x
sage: T2z1 = T2+T2
sage: T2z2 = T2+T2x
sage: T3z1 = T3+T3
sage: T3z2 = T3+T3x
sage: timeit("T0z1==T0z2", number=100000)
100000 loops, best of 3: 206 ns per loop
sage: timeit("T1z1==T1z2", number=100000)
100000 loops, best of 3: 308 ns per loop
sage: timeit("T2z1==T2z2", number=100000)
100000 loops, best of 3: 640 ns per loop
sage: timeit("T3z1==T3z2", number=100000)
100000 loops, best of 3: 47.8 µs per loop
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z1==S0z2", number=100000)
100000 loops, best of 3: 585 ns per loop
sage: timeit("S1z1==S1z2", number=100000)
100000 loops, best of 3: 494 ns per loop
sage: timeit("S2z1==S2z2", number=100000)
100000 loops, best of 3: 555 ns per loop
sage: timeit("S3z1==S3z2", number=100000)
100000 loops, best of 3: 578 ns per loop

Hash

This is the only operation for which bounded integer sequences seem to be consistently faster than tuples:

sage: timeit("hash(T0)", number=100000)
100000 loops, best of 3: 179 ns per loop
sage: timeit("hash(T0)", number=1000000)
1000000 loops, best of 3: 177 ns per loop
sage: timeit("hash(T1)", number=1000000)
1000000 loops, best of 3: 267 ns per loop
sage: timeit("hash(T2)", number=1000000)
1000000 loops, best of 3: 584 ns per loop
sage: timeit("hash(T3)", number=100000)
100000 loops, best of 3: 45.3 µs per loop
sage: timeit("hash(S0)", number=1000000)
1000000 loops, best of 3: 145 ns per loop
sage: timeit("hash(S1)", number=1000000)
1000000 loops, best of 3: 153 ns per loop
sage: timeit("hash(S2)", number=1000000)
1000000 loops, best of 3: 194 ns per loop
sage: timeit("hash(S3)", number=1000000)
1000000 loops, best of 3: 8.97 µs per loop

Concatenation

sage: timeit("T0+T0", number=1000000)
1000000 loops, best of 3: 200 ns per loop
sage: timeit("T1+T1", number=1000000)
1000000 loops, best of 3: 311 ns per loop
sage: timeit("T2+T2", number=1000000)
1000000 loops, best of 3: 742 ns per loop
sage: timeit("T3+T3", number=10000)
10000 loops, best of 3: 40.3 µs per loop
sage: timeit("S0+S0", number=1000000)
1000000 loops, best of 3: 576 ns per loop
sage: timeit("S1+S1", number=1000000)
1000000 loops, best of 3: 590 ns per loop
sage: timeit("S2+S2", number=1000000)
1000000 loops, best of 3: 618 ns per loop
sage: timeit("S3+S3", number=1000000)
1000000 loops, best of 3: 2.17 µs per loop

Subsequences

Recall the definition of S0z1 etc. We find:

sage: S0z2.startswith(S0)
True
sage: S0z2.startswith(S0x)
False
sage: S0x in S0z2
True
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 239 ns per loop
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 241 ns per loop
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 694 ns per loop
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 227 ns per loop
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 223 ns per loop
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 1.08 µs per loop
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 247 ns per loop
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 230 ns per loop
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 3.21 µs per loop
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 989 ns per loop
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 218 ns per loop
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 3.57 ms per loop

I wonder if the latter could be improved. Another "TODO"...


New commits:

5dc78c5Implement sequences of bounded integers

comment:51 in reply to: ↑ 48 Changed 3 years ago by SimonKing

Replying to ncohen:

None from me. I keep being impressed at how much testing/researching you put into this quiver features... I really wonder what you will do with it in the end ;-)

  • Gröbner bases for modules over path algebra quotients.
  • In particular, an implementation of the non-commutative version of Faugère's F5 algorithm that I have described in my latest article.
  • Compute minimal generating sets for modules over so-called "basic algebras" (that's a special type of path algebra quotients), which is possible with the non-commutative F5, as shown in my latest article.
  • Use all this to compute minimal projective resolutions of basic algebras, but please be faster than what we currently do in our optional group cohomology spkg. Currently, we use a method of David Green for computing the minimal resolutions; but F5 should be more efficient---theoretically...

That said: I guess it would make sense to adopt bounded integer sequences in appropriate parts of sage.combinat.words, but I leave this to others.

comment:52 follow-up: Changed 3 years ago by ncohen

Hellooooooo Simon !

Sorry but I only began to read your code and now I should really go get some sleep. I paste my current notes here in the meantime, but I am a long way from a proper review job.

  • bounded integer sequence, biseq. What about "integer sequence" ? All integers are "bounded" on a computer. Why add this "bounded" everywhere ? biseq -> iseq ?
  • # Bitsize of "data" --> bitsize of the whole sequence
  • itembitsize --> necessarily a power of 2 ?
  • #cdef inline void dealloc_biseq(biseq_t S) : why there if commented ?
  • list2biseq (and others) -> list_to_biseq ?
  • Comments on the function definitions --> Move it to the function's doc, even if it is their only content ?
  • Could you write some doc, even if it is just one line, for cdef functions ? A bit like what we have for bitsets ? Something like
cdef biseq_t* allocate_biseq(size_t l, unsigned long int itemsize):
     """
     Allocates a sequences of l integers stored on itemsize bits each.
     """
  • list2biseq --> I wondered why this function wouldn't return a new biseq buil from a list instead of modifying an existing one.... Performance issue ? Is that critical in some cases ?

Nathann

comment:53 in reply to: ↑ 52 Changed 3 years ago by SimonKing

Replying to ncohen:

  • bounded integer sequence, biseq. What about "integer sequence" ?

Not good, because...

All integers are "bounded" on a computer. Why add this "bounded" everywhere ? biseq -> iseq

... the computer's integer bound is chosen when you buy the machine, but not when you create the sequence.

The point is that each "bounded integer sequence" has a bound B that can be chosen upon initialisation so that any item of the sequence is smaller than B.

  • # Bitsize of "data" --> bitsize of the whole sequence

OK.

  • itembitsize --> necessarily a power of 2 ?

No. Here, it is really "bit" size. So, the number tells how many bits will be reserved to store one item. Hence, the bound B above is a power of two, namely 2^itembitsize.

  • #cdef inline void dealloc_biseq(biseq_t S) : why there if commented ?

Because I forgot to remove it.

  • list2biseq (and others) -> list_to_biseq ?

Acceptable.

  • Comments on the function definitions --> Move it to the function's doc, even if it is their only content ?
  • Could you write some doc, even if it is just one line, for cdef functions ?

I wouldn't remove the comments from the pxd file, but certainly some documentation should be added to the cdef functions in the pyx file. I wonder if this should be by comments in the code or by a doc string. After all, the doc string will not be visible (cdef functions won't appear in the docs, as they can not imported).

  • I just figured out that you may have wanted to keep this list of functions above with short descriptions as an 'index' of what the file does. If so, it would be cool to do it in the html doc instead !

You mean: "This file provides the following boilerplate functions, that you can cimport for usage in Cython code: ..."?

  • list2biseq --> I wondered why this function wouldn't return a new biseq buil from a list instead of modifying an existing one.... Performance issue ? Is that critical in some cases ?

This is motivated by my usage in the BoundedIntegerSequence wrapper. As usual, there is __cinit__ which does memory allocations (as much as I know, it is recommended to not do this kind of things in __init__). Hence, when __init__ is called (and I think filling data into the allocated memory is the job of __init__, not of __cinit__), then the integer sequence already is allocated, and just needs to be filled.

comment:54 follow-up: Changed 3 years ago by mmezzarobba

Sorry it that's a stupid question (I only glanced through the code), but why are you using mpz's and not mpn's? It looks like you are already doing most if not all memory management by hand...

comment:55 in reply to: ↑ 54 ; follow-up: Changed 3 years ago by SimonKing

Replying to mmezzarobba:

Sorry it that's a stupid question (I only glanced through the code), but why are you using mpz's and not mpn's?

Simply because I did not succeed in finding what functions are available for mpn_t. So, if you can give me a pointer (similar to the documentation for `mpz_t`), I'd appreciate. Probably non-signed types will be a tad faster than signed types.

It looks like you are already doing most if not all memory management by hand...

What would be the alternative to doing so? I mean, what I need is some tool to quickly operate on large bit arrays by shift operations, preferably also with a good and fast hash function, and without the need to think about the number of bits fitting into a byte or a long. I tried to implement this myself, using char* or long* (which would also mean to take care of allocating and freeing), but mpz_t turns out to be faster.

So, if you know another tool, that is perhaps more directly dedicated to bit arrays, then please tell!

comment:56 in reply to: ↑ 55 ; follow-up: Changed 3 years ago by mmezzarobba

Replying to SimonKing:

Simply because I did not succeed in finding what functions are available for mpn_t. So, if you can give me a pointer (similar to the documentation for `mpz_t`), I'd appreciate.

The documentation for mpn_* is at https://gmplib.org/manual/Low_002dlevel-Functions.html

It looks like you are already doing most if not all memory management by hand...

What would be the alternative to doing so?

I'm not saying I see one, only that, as far as I understand, automatic memory management is the main advantage of mpz_t over mpn_* when you are dealing with nonnegative integers (hence my first question).

Last edited 3 years ago by mmezzarobba (previous) (diff)

comment:57 in reply to: ↑ 56 ; follow-up: Changed 3 years ago by SimonKing

Replying to mmezzarobba:

The documentation for mpn_* is at https://gmplib.org/manual/Low_002dlevel-Functions.html

Thank you, I'll have a look and will try to compare.

It looks like you are already doing most if not all memory management by hand...

What would be the alternative to doing so?

I'm not saying I see one, only that, as far as I understand, automatic memory management is the main advantage of mpz_t over mpn_* when you are dealing with nonnegative integers (hence my first question).

Now I understand what you mean by your statement that I did all memory management by hand: I am not using mpz_init but mpz_init2, telling exactly how many bits I will need. I found that it has a noticeable impact on performance, since otherwise re-allocations will happen internally.

comment:58 in reply to: ↑ 57 Changed 3 years ago by mmezzarobba

Replying to SimonKing:

Now I understand what you mean by your statement that I did all memory management by hand: I am not using mpz_init but mpz_init2

Yes, exactly. Sorry if I wasn't clear!

comment:59 Changed 3 years ago by git

  • Commit changed from 5dc78c5f5f647bf95b98da916d7086fae539ffb8 to efef80bea22840f618335b7dd6f5d3e9a64fa301

Branch pushed to git repo; I updated commit sha1. New commits:

7ceac67Merge branch 'develop' into ticket/15820
efef80bRelocate bounded integer sequences (sage.misc, not sage.structure)

comment:60 Changed 3 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to Document cdef functions

I have at some point merged the latest beta. Sorry, I keep forgetting that merging stuff prematurely is bad. Main point was to relocate the code from sage.structure to sage.misc.

I don't think I will use mpn_*. It just seems too low-level. For example, I see mpn_lshift and mpn_rshift operations, but they operate on limbs, not on bit arrays. Hence, in order to implement a bitshift on bit arrays (which is what I need in concatenation and many other operations), I'd still need to invest a lot of work, essentially what I have done with the proof-of-concept using long* to store bit arrays. I doubt that in the end it would be any faster than using mpz_mul_2exp and friends.

Documentation of the cdef functions is still missing, this is what I intend to do next.

comment:61 follow-up: Changed 3 years ago by fredrik.johansson

If you use mpz functions, then getting/setting a single entry will be O(n) instead of O(1), and iterating over all entries or creating a sequence from a list of integers will be O(n2) instead of O(n). This makes the implementation rather useless for a lot of applications where a datatype like this would be useful (say if you want to store a million small integers in a memory-efficient way). With mpn functions, you can get the optimal complexity.

comment:62 in reply to: ↑ 61 Changed 3 years ago by SimonKing

  • Work issues changed from Document cdef functions to Document cdef functions. Pickling

Replying to fredrik.johansson:

If you use mpz functions, then getting/setting a single entry will be O(n) instead of O(1), and iterating over all entries or creating a sequence from a list of integers will be O(n2) instead of O(n).

Right, iteration, access to single items and creation from a list is not as fast as it could be. Can you point me to examples (in Sage? Elsewhere?) showeing how to use 'mpn_*`?

PS: I also forgot to implement pickling.

comment:63 Changed 3 years ago by git

  • Commit changed from efef80bea22840f618335b7dd6f5d3e9a64fa301 to 64ac7bab9d4c85136e9845878d69d82be7c300f8

Branch pushed to git repo; I updated commit sha1. New commits:

64ac7baPickling of bounded integer sequence. Documentation of cdef functions

comment:64 Changed 3 years ago by SimonKing

  • Work issues changed from Document cdef functions. Pickling to Use mpn_* for speedup of iteration and item access

I have documented the cdef functions, and I have implemented pickling. Again (at least for long sequence), this is faster than for ordinary tuples:

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L = [randint(0,26) for i in range(5000)]
sage: S = BoundedIntegerSequence(32, L)
sage: %timeit loads(dumps(S))
1000 loops, best of 3: 407 µs per loop
sage: T = tuple(S)
sage: %timeit loads(dumps(T))
100 loops, best of 3: 2.02 ms per loop

I hope there is a nice example for mpn_* somewhere.

comment:65 Changed 3 years ago by SimonKing

Hooray, using mpn_* improves the time to access item 2000 in an integer sequence of bound 32 from 1.49 µs to 545 ns. With tuples, the access time is 362 ns. So, that's almost competitive.

comment:66 Changed 3 years ago by SimonKing

Another hooray: Conversion of a list of 5000 integers to a bounded integer sequence of bound 32 improved from 3.25 ms to 75.3 µs. Conversion to a tuple only takes 19.6 µs, but I doubt that this will be possible to beat.

comment:67 Changed 3 years ago by git

  • Commit changed from 64ac7bab9d4c85136e9845878d69d82be7c300f8 to 836f63fb96241033953e42d6d18becfaa4fe33b7

Branch pushed to git repo; I updated commit sha1. New commits:

050b118Improve access to items of bounded integer sequences
c2e3547Improve conversion "list->bounded integer sequence"
836f63fImprove iteration and list/string conversion of bounded integer sequences

comment:68 Changed 3 years ago by SimonKing

There remains to improve computation of the index of an item or of a sub-sequence, and slicing. When this is done, I'll provide updated benchmarks.

comment:69 Changed 3 years ago by git

  • Commit changed from 836f63fb96241033953e42d6d18becfaa4fe33b7 to c8a299ba54fa05d6851fd492019d9996e7e31515

Branch pushed to git repo; I updated commit sha1. New commits:

c8a299bMore documentation of bounded integer sequences

comment:70 Changed 3 years ago by git

  • Commit changed from c8a299ba54fa05d6851fd492019d9996e7e31515 to 735939e593c9c302ff42c4973cbfe2e48eb22644

Branch pushed to git repo; I updated commit sha1. New commits:

775d795Improve index computation for bounded integer sequences
391a102Improve bounded integer subsequent containment test
735939eImprove slicing of bounded integer sequences

comment:71 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

Using the mpn_* function was a very good idea. Some basic operations are a lot faster now. Here I am repeating (and slightly extending) my earlier benchmarks for those operations that have changed in the recent commits:

The test bed

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)

Conversion list -> tuple/sequence

sage: %timeit x = tuple(L0)
1000000 loops, best of 3: 307 ns per loop
sage: %timeit x = tuple(L1)
1000000 loops, best of 3: 369 ns per loop
sage: %timeit x = tuple(L2)
1000000 loops, best of 3: 534 ns per loop
sage: %timeit x = tuple(L3)
10000 loops, best of 3: 19.6 µs per loop
sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.34 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.57 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
100000 loops, best of 3: 2.06 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 76.7 µs per loop

Conversion to a tuple seems to be roughly 4 times faster. But I don't think this will be critical (at least not in my own applications...)

Conversion tuple/sequence -> list

sage: %timeit x = list(T0)
1000000 loops, best of 3: 537 ns per loop
sage: %timeit x = list(T1)
1000000 loops, best of 3: 624 ns per loop
sage: %timeit x = list(T2)
1000000 loops, best of 3: 752 ns per loop
sage: %timeit x = list(T3)
100000 loops, best of 3: 22.6 µs per loop
sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.58 µs per loop
sage: %timeit x = list(S1)
100000 loops, best of 3: 2.06 µs per loop
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.19 µs per loop
sage: %timeit x = list(S3)
1000 loops, best of 3: 253 µs per loop
sage: %timeit x = S0.list()
1000000 loops, best of 3: 1.05 µs per loop
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.87 µs per loop
sage: %timeit x = S2.list()
100000 loops, best of 3: 4.95 µs per loop
sage: %timeit x = S3.list()
1000 loops, best of 3: 306 µs per loop

The gap between lists and sequences strongly depends on the bound of the sequence, which is not much of a surprise. Anyway, it is faster than before.

Slicing

For step 1, short tuples are faster than short sequences, but long sequences are faster than long tuples:

sage: timeit("x=T0[:-1]", number=100000)
100000 loops, best of 3: 479 ns per loop
sage: timeit("x=T1[:-1]", number=100000)
100000 loops, best of 3: 548 ns per loop
sage: timeit("x=T2[:-1]", number=100000)
100000 loops, best of 3: 773 ns per loop
sage: timeit("x=T3[:-1]", number=100000)
100000 loops, best of 3: 19.6 µs per loop
sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 1.75 µs per loop
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 1.8 µs per loop
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 1.77 µs per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 2.4 µs per loop

As before, a step different from 1 is bad for sequences. However, there is an improvement compared with the previous timings:

sage: timeit("x=T2[:-1:2]", number=100000)
100000 loops, best of 3: 944 ns per loop
sage: timeit("x=T3[:-1:2]", number=100000)
100000 loops, best of 3: 11.8 µs per loop
sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 2.7 µs per loop
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 52.2 µs per loop

Accessing single items

Bounded integer sequences can now compete with tuples, both short and long!

sage: timeit("x=T0[1]", number=1000000)
1000000 loops, best of 3: 349 ns per loop
sage: timeit("x=T0[4]", number=1000000)
1000000 loops, best of 3: 351 ns per loop
sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 354 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 355 ns per loop
sage: timeit("x=T3[1]", number=1000000)
1000000 loops, best of 3: 346 ns per loop
sage: timeit("x=T3[4500]", number=1000000)
1000000 loops, best of 3: 347 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 350 ns per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 370 ns per loop

Containment tests

Here we talk about operations that do not exist in this form for tuples (e.g., subsequence tests). They have been improved by the latest commits:

sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 230 ns per loop
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 236 ns per loop
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 511 ns per loop
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 215 ns per loop
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 219 ns per loop
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 692 ns per loop
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 238 ns per loop
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 228 ns per loop
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 1.92 µs per loop
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 981 ns per loop
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 213 ns per loop
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 2.41 ms per loop

Conclusion

  • Slicing and iteration are still a weak point of bounded integer sequences. However, there is an improvement by the latest commits.
  • Access to single items is now competitive. Some other operations have been competitive or superior before.
  • Probably iteration and slicing could be improved with more complicated code: Currently, I take a single item, do a bitshift (which is faster than before!) and then proceed with the next item. But since several items usually fit into one "limb" (this is how GMP stores the data), it would be possible to accumulate a group of items into one limb, then do a single bitshift for this group of items, and then proceed to the next group of items. I would consider to implement it if my own applications show the need for it...

Needs review, I think! Please check if I produced a memory leak (by not freeing allocated temporary data), a memory corruption (by messing up in the case of items that are stored exactly on the seam between two limbs), or functions in which a memory error or keyboard interrupt would not be propagated. I plan to do such tests, too, but I suppose 4 or 6 eyes see more than 2.

comment:72 Changed 3 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues changed from Use mpn_* for speedup of iteration and item access to Fix flakyness in adding sequences

I found failures in some runs:

king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx 
Running doctests with ID 2014-06-03-13-33-10-7f6bcfa0.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1059, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__
Failed example:
    S+T
Expected:
    <4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 15>
Got:
    <4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 31>
**********************************************************************
1 item had failures:
   1 of  10 in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__
    [178 tests, 1 failure, 0.79 s]
----------------------------------------------------------------------
sage -t src/sage/misc/bounded_integer_sequences.pyx  # 1 doctest failed
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.8 seconds
    cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx 
Running doctests with ID 2014-06-03-13-33-15-6cf37501.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
    [178 tests, 0.78 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.8 seconds
    cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx 
Running doctests with ID 2014-06-03-13-33-20-d43d6691.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1061, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__
Failed example:
    T+S
Expected:
    <4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 9>
Got:
    <4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 25>
**********************************************************************
1 item had failures:
   1 of  10 in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__
    [178 tests, 1 failure, 0.78 s]
----------------------------------------------------------------------
sage -t src/sage/misc/bounded_integer_sequences.pyx  # 1 doctest failed
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.8 seconds
    cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx 
Running doctests with ID 2014-06-03-13-33-25-04f71350.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
    [178 tests, 0.78 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.8 seconds
    cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx 
Running doctests with ID 2014-06-03-13-33-29-a2a81517.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
    [178 tests, 0.79 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.8 seconds
    cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx 
Running doctests with ID 2014-06-03-13-33-33-e89cc5fe.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
    [178 tests, 0.81 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.8 seconds
    cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx 
Running doctests with ID 2014-06-03-13-33-38-83426cf0.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
    [178 tests, 0.79 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
    cpu time: 0.8 seconds
    cumulative wall time: 0.8 seconds

My first guess is that this comes from using sage_malloc in some places, i.e., I should zero the memory first.

comment:73 Changed 3 years ago by SimonKing

There is something seriously broken, and apparently in the last limb of stuff:

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: T = BoundedIntegerSequence(21, [4,1,6,2,8,15])
sage: S+T
<4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 15>
sage: T+S
<4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 9>
sage: S+T
<4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 31>

My current guess is that in some place I get the number of limbs wrong when I manually assign it to the underlying mpz_t.

comment:74 Changed 3 years ago by SimonKing

Now I can explain what goes wrong in above example.

GMP has its limbs, and I have my sequence items. In the example, each item consumes 5 bit, but each limb comprises 32 bit. As it happens, the last bit of my last item is 0, and the last limb only stores this single zero bit. Hence, when I ask GMP to do high-level operations (which makes sense for concatenation, I wouldn't like to do it low-level), GMP tries to be clever and drops the last limb, since it is zero. But when I transform the result to a list, I want to access the last bit. But since the last bit is not in a limb any longer, it belongs to non-allocated memory. Hence, randomly, the last bit can be 0 or 1, which explains why in the example above the difference between the correct and the wrong last item is 16.

So, what I should do: When transforming a sequence to a string or list (or when iterating), I should check whether I am running outside of GMP's limbs. Alternatively, I should provide a bit 1 right behind of my last item, so that GMP will not drop the last limb (as I forced it to be non-zero).

comment:75 Changed 3 years ago by SimonKing

Alternatively, I could replace all mpz_* by mpn_*. But that would be boring.

comment:76 Changed 3 years ago by SimonKing

PS: Adding sequences that are constant zero exhibits the same problem.

comment:77 Changed 3 years ago by SimonKing

I found examples that used to trigger the above-mentioned problems reliably. So, they will be added to the docs. I fixed the iteration problem, but I am still struggling with pickling.

comment:78 Changed 3 years ago by SimonKing

Very odd. Could it be that I hit a bug in GMP in the conversion of an integer to a string representation at base 2^n? Namely, in my examples, the 32-adic string representation used to pickle a bounded integer sequence is not representing the stored bit array, but the 31-adic or 10-adic string representation works fine (but probably is slower).

This seems to happen when the last limb used to store a GMP integer is in fact zero.

comment:79 Changed 3 years ago by SimonKing

Yes, setting the _mp_size field of __mpz_struct in such a way that the limb _mp_d[_mp_size-1] is non-zero solves the problem. I thought GMP would cope with "superfluous" zeroes, but in string representation it does not.

So, I have to check where I have been lazy with the _mp_size field when using mpn_* functions.

comment:80 Changed 3 years ago by SimonKing

Current status is a struggle with sub-sequence recognition in the presence of trailing zeroes.

Here is the type of examples that I propose:

  • Use a sequence of length 7 with a bound of 32. Then, we need 35 bit to store the sequence.
  • At least on my machine, a GMP limb comprises 32 bit. Hence, the above sequence fits into two limbs, we only use three bit of the second limb.
  • Choose the last item of the sequence such that it fits into 2 bit. Then, the three bit stored in the second limb are zero.
  • Now, do an operation that involves a mpz_* function to create a new bounded integer sequence, only adding trailing zeroes. This currently is in concatenation and slicing. This may result in GMP cutting off the trailing zeroes.

When we then do operations involving mpn_* functions, it is needed to take care of the cut off bits (and treat them as zero). My current private branch (not pushed yet) involves several tests that are failing with the branch currently attached here, but succeed in my current branch.

Still failing is this:

            sage: X = BoundedIntegerSequence(21, [4,1,6,2,7,2,3])
            sage: S = BoundedIntegerSequence(21, [0,0,0,0,0,0,0])
            sage: loads(dumps(X+S))
            <4, 1, 6, 2, 7, 2, 3, 0, 0, 0, 0, 0, 0, 0>
            sage: loads(dumps(X+S)) == X+S
            True
            sage: T = BoundedIntegerSequence(21, [0,4,0,1,0,6,0,2,0,7,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
            sage: T[3::2]==(X+S)[1:]
            True
            sage: T[3::2] in X+S
            False     # should return True!

EDIT: Recall that providing bound 21 will internally changed into the next power of two, hence, 32.

Last edited 3 years ago by SimonKing (previous) (diff)

comment:81 Changed 3 years ago by git

  • Commit changed from 735939e593c9c302ff42c4973cbfe2e48eb22644 to c900a2cd4045aa3463be31d76c9f047b05571958

Branch pushed to git repo; I updated commit sha1. New commits:

c900a2cTake care of GMP's removal of trailing zeroes

comment:82 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review
  • Work issues Fix flakyness in adding sequences deleted

I solved the problems with wrong sizes / trailing zeroes. I added all tests that used to fail with previous versions as doc tests. All tests are passing now. Hence: Needs review again. Tomorrow, I'll provide new timings, as the new sanity checks may result in a small slow-down. We will see...

comment:83 follow-up: Changed 3 years ago by SimonKing

Repeating the tests, in each case (even when the underlying function has not changed) stating the timing of the old code:

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.4 µs per loop  # was 1.34 µs
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.59 µs per loop # was 1.57 µs
sage: %timeit x = BoundedIntegerSequence(32, L2)
100000 loops, best of 3: 2.13 µs per loop  # was 2.06 µs
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 76.3 µs per loop   # was 76.7 µs

It surprises me that some of the following became faster:

sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.55 µs per loop # was 1.58 µs
sage: %timeit x = list(S1)
100000 loops, best of 3: 2.14 µs per loop  # was 2.06 µs
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.31 µs per loop  # was 4.19 µs
sage: %timeit x = list(S3)
1000 loops, best of 3: 274 µs per loop     # was 253 µs
sage: %timeit x = S0.list()
1000000 loops, best of 3: 810 ns per loop  # was 1.05 µs
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.22 µs per loop # was 1.87 µs
sage: %timeit x = S2.list()
100000 loops, best of 3: 2.9 µs per loop   # was 4.95 µs
sage: %timeit x = S3.list()
10000 loops, best of 3: 105 µs per loop    # was 306 µs
sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 1.76 µs per loop  # was 1.75 µs
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 1.76 µs per loop  # was 1.8 µs
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 1.78 µs per loop  # was 1.77 µs
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 2.46 µs per loop  # was 2.4 µs
sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 2.68 µs per loop   # was 2.7 µs
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 52.2 µs per loop   # was 52.2 µs

The following apparently became slower:

sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 370 ns per loop  # was 354 ns
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 375 ns per loop  # was 355 ns
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 371 ns per loop  # was 350 ns
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 391 ns per loop  # was 370 ns
sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 233 ns per loop  # was 230 ns
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 238 ns per loop  # was 236 ns
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 457 ns per loop  # was 511 ns
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 219 ns per loop  # was 215 ns
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 223 ns per loop  # was 219 ns
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 813 ns per loop  # was 692 ns
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 244 ns per loop  # was 238 ns
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 223 ns per loop  # was 228 ns
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 5.59 µs per loop # was 1.92 µs
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 990 ns per loop  # was 981 ns
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 216 ns per loop  # was 213 ns
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 2.33 ms per loop    # was 2.41 ms

So, overall, I think that the timings are still decent and it makes sense to use bounded integer sequences as replacement of tuples in some applications.

comment:84 in reply to: ↑ 83 Changed 3 years ago by ncohen

So, overall, I think that the timings are still decent and it makes sense to use bounded integer sequences as replacement of tuples in some applications.

Especially since they are probably MUCH more compact in memory.

Nathann

comment:85 Changed 3 years ago by git

  • Commit changed from c900a2cd4045aa3463be31d76c9f047b05571958 to 1192c211965ea1dcf1fdbaeaec7e68bf61ba265e

Branch pushed to git repo; I updated commit sha1. New commits:

1192c21Allow empty slices; bounded integer sequence -> bool

comment:86 Changed 3 years ago by SimonKing

I just fixed a segfault that has occurred for empty slices, and I implemented conversion sequence to bool (in the usual way: A sequence is nonzero if and only if it has a positive length). Still needs review...

comment:87 Changed 3 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to Fix other out of bound errors

Sigh. I hate trailing zeroes.

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: (BoundedIntegerSequence(21, [0,0]) + BoundedIntegerSequence(21, [0,0]))
<16, 0, 30, 30>

comment:88 Changed 3 years ago by SimonKing

My original plan was to provide a function to compute the maximal overlap of sequences in the follow-up ticket #16453. However, while I am at fixing bugs, I'll move this new function to here.

comment:89 Changed 3 years ago by git

  • Commit changed from 1192c211965ea1dcf1fdbaeaec7e68bf61ba265e to ff4477a1600b44031c2fb962ff3c5ab24c110410

Branch pushed to git repo; I updated commit sha1. New commits:

ff4477aRemove biseq_to_str. Add max_overlap. Fix boundary violations

comment:90 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

Done!

Nathann, while I was at it, I have removed the biseq_to_str function, as we have discussed off-trac.

comment:91 Changed 3 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues changed from Fix other out of bound errors to Fix containement test

Again, there is trouble with zeroes. The sequence <0> is currently considered a subsequence of the sequence <1,2,3>:

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S1 = BoundedIntegerSequence(3,[1,3])
sage: S2 = BoundedIntegerSequence(3,[0])
sage: S2 in S1    # very wrong
True
sage: S1.startswith(S2) # correct
False

comment:92 Changed 3 years ago by SimonKing

It seems that the problem lies in mpz_congruent_2exp_p(A, B, nbits): In the problematic situation, A==0, B==13 and nbits==2. It should be the case that the function returns "False", since 13 is not congruent to 0 modulo 4 (i.e. modulo two bits). But it doesn't. And that's totally weird. I'll try to reproduce it independent of the bounded integer sequences, i.e., I will see if it is a horribly GMP bug. Probably it isn't.

comment:93 Changed 3 years ago by SimonKing

Yes, it isn't:

sage: cython("""
....: include "sage/libs/ntl/decl.pxi"
....: def test(x,y):
....:     cdef mpz_t a,b
....:     mpz_init_set_ui(a, x)
....:     mpz_init_set_ui(b, y)
....:     print mpz_congruent_2exp_p(a,b,2)
....:     print mpz_get_ui(a), "vs", mpz_get_ui(b)
....:     print (<__mpz_struct*>a)._mp_size
....:     print (<__mpz_struct*>b)._mp_size
....:     mpz_clear(a)
....:     mpz_clear(b)
....: """)
sage: test(0,13)
False
0 vs 13
0
1

My current impression is that in the same situation as above, mpz_congruent_2exp_p gives the wrong result (True) in the bounded integer sequence code. Strange.

comment:94 Changed 3 years ago by SimonKing

Argh, now I see the difference: By a bug, I somehow managed to give the number 13 the size zero. mpz_congruent_2exp_p thus believes that 13==0 and correctly concludes that 0 is congruent 13 modulo 4.

comment:95 Changed 3 years ago by git

  • Commit changed from ff4477a1600b44031c2fb962ff3c5ab24c110410 to 93546db866564a80d80df30de48934697e2b0d1d

Branch pushed to git repo; I updated commit sha1. New commits:

fd1675dMerge branch 'develop' into t/15820/ticket/15820
93546dbFix subsequence containment test.

comment:96 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review
  • Work issues Fix containement test deleted

The problem is fixed. I inserted comments into the code that hopefully explain what happens in the subsequence containment test.

Doctests pass (including a new test that has previously failed), hence, it needs review again!

comment:97 Changed 3 years ago by vbraun_spam

  • Milestone changed from sage-6.3 to sage-6.4

comment:98 follow-up: Changed 3 years ago by chapoton

  • Status changed from needs_review to needs_work

still two failing doctests, see patchbot report

comment:99 in reply to: ↑ 98 Changed 3 years ago by SimonKing

Replying to chapoton:

still two failing doctests, see patchbot report

This is all very odd. All pathchbots report the same error, but I don't see it on my own machine. So far, I don't know where I could start debugging.

comment:100 Changed 3 years ago by SimonKing

Is it perhaps the case that something relevant (GMP) was upgraded after sage-6.3.beta5? That's the version that I use on my laptop.

comment:101 Changed 3 years ago by SimonKing

Currently I won't upgrade to sage-6.4, as I am rather busy with another project. Could one of you please test if the error appears in sage-6.4 but not in sage-6.3?

comment:102 Changed 3 years ago by chapoton

I will test on 6.4.beta1. Building right now.

comment:103 Changed 3 years ago by chapoton

I confirm the doctest failures on 6.4.beta1:

sage -t --long src/sage/misc/bounded_integer_sequences.pyx
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1086, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__getitem__
Failed example:
    S[1::2]
Expected:
    <1, 2, 20>
Got:
    <1, 6, 23>
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1088, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__getitem__
Failed example:
    S[-1::-2]
Expected:
    <9, 7, 6, 4>
Got:
    <9, 7, 22, 15>


comment:104 Changed 3 years ago by SimonKing

Interesting!

So, what has changed between sage-6.3.b5 and sage-6.4.b1? Specifically, has something happened with GMP?

comment:105 Changed 3 years ago by SimonKing

PS: Can you reproduce it on the command line?

At some point I should upgrade, to fix the issue...

comment:106 Changed 3 years ago by chapoton

Yes, same behaviour in the command line.

comment:107 Changed 3 years ago by SimonKing

I have upgraded now, and I do not see the error.

So, what could we do to trace the problem down?

comment:108 Changed 3 years ago by SimonKing

Since I cannot reproduce the problem myself, could you please help me by providing some internal data? Perhaps there are different architectures (32bit or 64bit, big or little endian) at work?

Please add the following as a method of BoundedIntegerSequence:

    def inspect(self):
        cdef __mpz_struct seq
        seq = deref(<__mpz_struct*>self.data.data)
        print "bitsize",self.data.bitsize
        print "itembitsize",self.data.itembitsize
        print "mask_item", Integer(self.data.mask_item).binary()
        print "length", self.data.length
        print "GMP size", seq._mp_size
        print "GMP alloc", seq._mp_alloc
        print '.'.join([Integer(seq._mp_d[limb]).binary().rjust(mp_bits_per_limb,'0') for limb in range(seq._mp_size-1,-1,-1)])

With this, I get

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: S.inspect()
bitsize 35
itembitsize 5
mask_item 11111
length 7
GMP size 2
GMP alloc 3
00000000000000000000000000000010.01101000011100010001100000100100
sage: S[1::2]
<1, 2, 20>
sage: (S[1::2]).inspect()
bitsize 15
itembitsize 5
mask_item 11111
length 3
GMP size 1
GMP alloc 2
00000000000000000101000001000001
sage: S[-1::-2]
<9, 7, 6, 4>
sage: (S[-1::-2]).inspect()
bitsize 20
itembitsize 5
mask_item 11111
length 4
GMP size 1
GMP alloc 2
00000000000000100001100011101001

What do you get instead?

comment:109 Changed 3 years ago by chapoton

got that :

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: S.inspect()
bitsize 35
itembitsize 5
mask_item 11111
length 7
GMP size 1
GMP alloc 2
0000000000000000000000000000001001101000011100010001100000100100
sage: S[1::2]
<1, 6, 23>
sage: (S[1::2]).inspect()
bitsize 15
itembitsize 5
mask_item 11111
length 3
GMP size 1
GMP alloc 2
0000000000000000000000000000000000010011110111111101110011000001
sage: S[-1::-2]
<9, 7, 22, 15>
sage: (S[-1::-2]).inspect()
bitsize 20
itembitsize 5
mask_item 11111
length 4
GMP size 1
GMP alloc 2
0000000000000001001101000011101011101100011101111101100011101001

comment:110 Changed 3 years ago by SimonKing

Thank you! So, one obvious difference is that on your computer, GMP uses chunks of 64 bit to store long integers, whereas it is only 32 bit on my machine (even though it is a 64 bit CPU). I'll see if I can make sense of the error.

comment:111 Changed 3 years ago by SimonKing

Another interesting point is that the single limb used to store S[1::2] resp. S[-1::-2] on your machine is full of junk bits.

Anyway, what surprises me is the following: Up to know, all problems arose because of a misfit where two limbs meet. But here, for the first time, we have errors occurring in a single limb.

comment:112 Changed 3 years ago by SimonKing

Hooray, I can reproduce the problem on my machine, with a different example!

The key is (as on your machine) to create a sequence that fits inside a single limb, an then slice:

sage: S = BoundedIntegerSequence(8, [4,1,6,2,7,2,5,5,2])
sage: S
<4, 1, 6, 2, 7, 2, 5, 5, 2>
sage: S[1::2]
<1, 6, 7, 7>
sage: S[-1::-2]
<2, 5, 7, 6, 7>

The sequence above is of length 9, with items fitting into 3 bit. Hence, 27 bit are used, whereas one limb comprises 32 bits.

So, stuff for debugging!

comment:113 Changed 3 years ago by SimonKing

Got it!

In some branches of the slice_biseq function, I forgot to restrict the to-be-copied data down to the bitsize of items (i.e., I forgot ...&S.mask_item). This oversight only happened in branches dealing with the case that everything fits within a single limb. Hence, no error in the old doctest for me (because it comprises two limbs on my machine), but an error for you (because it comprises a single limb on your machine).

comment:114 Changed 3 years ago by git

  • Commit changed from 93546db866564a80d80df30de48934697e2b0d1d to c69c67ccf3093a3c947ebbe83d5224daf695c3f9

Branch pushed to git repo; I updated commit sha1. New commits:

3b2f713Merge branch 'develop' into t/15820/ticket/15820
c69c67cUse a bitmask when slicing bounded integer sequences

comment:115 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

The fix is pushed, and of course the example that was failing on my machine has become a new doctest.

comment:116 Changed 3 years ago by git

  • Commit changed from c69c67ccf3093a3c947ebbe83d5224daf695c3f9 to b331dc4f12e7291e9537f607ed155085cabc3fcd

Branch pushed to git repo; I updated commit sha1. New commits:

b331dc4Fix mem leak converting zero-valued biseq_t to list

comment:117 Changed 3 years ago by SimonKing

I have fixed a memory leak: In the function biseq_to_list, some temporary buffer was allocated, but in the case that the sequence is zero valued, the function returns without freeing the buffer. Solution: Only allocate the buffer after checking that the sequence is not zero valued.

Still needing review (hint-hint)...

comment:118 Changed 3 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to memory corruption

I found that the following function (written into bouned_integer_sequences.pyx) achieves to crash sage:

def _biseq_stresstest():
    cdef int i
    from sage.misc.prandom import randint
    cdef list L = [BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))]) for y in range(100)]
    cdef int branch
    cdef BoundedIntegerSequence S, T
    for i from 0<=i<10000:
        if (i%1000) == 0:
            print i
        branch = randint(0,4)
        if branch == 0:
            L[randint(0,99)] = L[randint(0,99)]+L[randint(0,99)]
        elif branch == 1:
            x = randint(0,99)
            if len(L[x]):
                y = randint(0,len(L[x])-1)
                z = randint(y,len(L[x])-1)
                L[randint(0,99)] = L[x][y:z]
            else:
                L[x] = BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))])
        elif branch == 2:
            t = list(L[randint(0,99)])
            t = repr(L[randint(0,99)])
            t = L[randint(0,99)].list()
        elif branch == 3:
            x = randint(0,99)
            if len(L[x]):
                t = L[x][randint(0,len(L[x])-1)]
                try:
                    t = L[x].index(t)
                except ValueError:
                    print L[x]
                    print t
                    raise
            else:
                L[x] = BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))])
        elif branch == 4:
            S = L[randint(0,99)]
            T = L[randint(0,99)]
            startswith_biseq(S.data,T.data)
            contains_biseq(S.data, T.data, 0)
            max_overlap_biseq(S.data, T.data)

Can someone please see what happens on other machines? On mine (with the above function, but also with something that was supposed to be the corner stone in my current project), crashes occur in sage_free or mpz_clear. Surprisingly, the variables-to-be-freed/cleared should be just fine, in some cases they haven't even been touched after allocation.

My wild guess is that using sage_malloc (in bounded_integer_sequences.pyx) somehow interferes with gmp's (mpir's?) memory management. Even if the reason is different, I suppose it would make sense to get rid of sage_malloc and sage_free in my code. After all, I use it for buffers whose length is known at compile time. Hence, I should better use arrays.

So, my plan is: Use arrays instead of sage_malloc, and hope that this is enough to let above function pass, which would then become a doctest.

Last edited 3 years ago by SimonKing (previous) (diff)

comment:119 follow-up: Changed 3 years ago by SimonKing

Hmm. Using, for example,

    cdef mp_limb_t tmp_limb[1]

instead of

    cdef mp_limb_t *tmp_limb
    tmp_limb = <mp_limb_t*>sage_malloc(sizeof(mp_limb_t))

compiles, but then it seems that the mpn_* functions do not correctly put values into the arrays, when I give them tmp_limb as argument.

Question to C/Cython experts: Is tmp_limb not just of type <mp_limb_t*> when I define cdef mp_limb_t tmp_limb[1]?

comment:120 in reply to: ↑ 119 ; follow-up: Changed 3 years ago by SimonKing

Replying to SimonKing:

Question to C/Cython experts: Is tmp_limb not just of type <mp_limb_t*> when I define cdef mp_limb_t tmp_limb[1]?

Forget the question. The problem was somewhere else. Now I think I can make arrays works as replacement for the current usage of sage_malloc.

comment:121 in reply to: ↑ 120 Changed 3 years ago by SimonKing

Replying to SimonKing:

Replying to SimonKing:

Question to C/Cython experts: Is tmp_limb not just of type <mp_limb_t*> when I define cdef mp_limb_t tmp_limb[1]?

Forget the question. The problem was somewhere else. Now I think I can make arrays works as replacement for the current usage of sage_malloc.

Do not forget the question. There is something extremely strange going on. When I insert a print statement after assigning the size of an __mpz_struct*, then things work. Without the print statement, the assignment of the size is ignored and the value incorrectly left zero.

So, how can a print statement influence whether or not some value of a variable in GMP is assigned?

comment:122 Changed 3 years ago by git

  • Commit changed from b331dc4f12e7291e9537f607ed155085cabc3fcd to 7b53dc82d032d5a787e10731938f9048b620031b

Branch pushed to git repo; I updated commit sha1. New commits:

7b53dc8Try to improve memory management for biseq_t, and add a stress test

comment:123 Changed 3 years ago by SimonKing

I have replaced sage_malloc by arrays, since the array size is known at compile time. I think this is a good idea anyway, and the doctests pass.

But not all is good, and here I need help: _biseq_stresstest() either crashes with free(): invalid next size (fast), or it raises a value error stating that some element is not in a bounded integer sequence, even though by construction it is contained. So, memory management still goes havoc.

Any clues or pointers?

Last edited 3 years ago by SimonKing (previous) (diff)

comment:124 Changed 3 years ago by SimonKing

I was running it under valgrind---and as a result, I did not see a crash! However, the other problem (a value error when a sequence is supposed to return the index of one of its elements) still persists under valgrind.

Concerning the crash: What can one do to further debug if a crash can not be reproduced with valgrind?

comment:125 Changed 3 years ago by SimonKing

PS: I can not quit sage when I run it under valgrind---"quit" has no effect!

comment:126 Changed 3 years ago by SimonKing

Even "pkill python" did not achieve to finish the valgrinded Sage session. Boy, am I fed up now. Time to call it a day.

comment:127 Changed 3 years ago by SimonKing

Actually I realise that I made a severe mistake all the time. concat_biseq returns a pointer to a local variable, and I did so in order to make it possible to propagate errors (by saying ... except NULL). But returning a pointer to a local variable can not possibly work.

Since I want to propagate errors, I suppose it will be better to consistently work with pointers. I'll rewrite the module accordingly...

comment:128 Changed 3 years ago by SimonKing

It seems that that was the reason indeed! After rewriting everything so that biseq_t becomes a pointer to a struct (before, biseq_t was the struct), all doctests pass, and the stresstest doesn't crash.

But not all is good: The stresstest doesn't crash, but sporadically it finds can not find the index of an element of a sequence. So, needs work and I will not push yet, but it is almost done.

comment:129 Changed 3 years ago by SimonKing

The stresstest found that the following fails:

            sage: S = BoundedIntegerSequence(6, [3, 5, 3, 1, 5, 2, 2, 5, 3, 3, 4])
            sage: S[-1]
            4

Currently it returns 0.

comment:130 Changed 3 years ago by git

  • Commit changed from 7b53dc82d032d5a787e10731938f9048b620031b to 0aa3cbcfb1be64154ad32a419f250a9c6e52dcf2

Branch pushed to git repo; I updated commit sha1. New commits:

7c9f0f2Biseq refactored, so that only pointers are passed around.
0aa3cbcFix corner cases in item access for bounded integer sequences

comment:131 Changed 3 years ago by SimonKing

The stress test is really valuable. I found several small problems concerning corner cases. And, as I have announced, I have reworked the code so that only "proper" pointers rather than stupid references to local variables are passed around.

There remain problems, though. In the doctests, I have commented out the stresstest, because there are STILL crashes. This time, mpz_clear sometimes doesn't work. My wild guess is that at some point I write out of bounds when using mpn_*-functions.


New commits:

7c9f0f2Biseq refactored, so that only pointers are passed around.
0aa3cbcFix corner cases in item access for bounded integer sequences

comment:132 Changed 3 years ago by SimonKing

Hooray, by inserting explicit range checks, I found that I am indeed writing out of allocated area in slice_biseq. That (and similar errors?) will hopefully explain the crashes...

comment:133 Changed 3 years ago by git

  • Commit changed from 0aa3cbcfb1be64154ad32a419f250a9c6e52dcf2 to f20dc096832f260edee2f28b3007bf42263f54ac

Branch pushed to git repo; I updated commit sha1. New commits:

f20dc09Fix writing out of bounds, and assert that the bounds are respected

comment:134 follow-up: Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

In the new commit, I added assertions to make sure that I am not writing into non-allocated memory. So, the code should now be water proof in that regard.

All tests pass, including the "stress test". So, it can be reviewed now!

Question to the reviewer: Can the assertions be removed? After all, what is being asserted is just that some variables are correctly computed, but this can in principle be checked by reading the code. Or are assertions (even in relatively tight loops) cheap enough?

comment:135 in reply to: ↑ 134 Changed 3 years ago by nthiery

Replying to SimonKing:

Question to the reviewer: Can the assertions be removed? After all, what is being asserted is just that some variables are correctly computed, but this can in principle be checked by reading the code. Or are assertions (even in relatively tight loops) cheap enough?

If you run Python with the option -O, all the assert calls are ignored. Something similar should be doable, at compile time of course, for Cython.

Most programming languages support this to encourage developpers to never remove asserts. We really should have this feature in Sage. A good starting point would be to have sage -O call Python -O.

Cheers,

Nicolas

comment:136 Changed 3 years ago by git

  • Commit changed from f20dc096832f260edee2f28b3007bf42263f54ac to cea38bb0a7d6f748cfc1feef357df5e22eb7b652

Branch pushed to git repo; I updated commit sha1. New commits:

cea38bbChange doc according to the changed functionality of list_to_biseq

comment:137 Changed 3 years ago by SimonKing

  • Work issues memory corruption deleted

comment:138 Changed 3 years ago by git

  • Commit changed from cea38bb0a7d6f748cfc1feef357df5e22eb7b652 to 4611f8aa591616f118b50ebc2d2516c635e40318

Branch pushed to git repo; I updated commit sha1. New commits:

4611f8aMinor changes in the docs

comment:139 Changed 3 years ago by git

  • Commit changed from 4611f8aa591616f118b50ebc2d2516c635e40318 to 815d77c77f48605ea33b87a88120aaabadfb5f7f

Branch pushed to git repo; I updated commit sha1. New commits:

815d77cmpn_r/lshift should only be used with strictly positive shift

comment:140 Changed 3 years ago by SimonKing

This is to explain the most recent commit: By doing some non-commutative Gröbner basis computations relying on "biseq_t", I obtained crashes apparently caused by the slice_biseq function. It is using mpn_rshift, and it did so even when the shift was zero. However, I found in the docs that the shift is supposed to be at least 1. Hence, when there is no shift, I am now using mpn_copyi instead. Hope it helps to prevent the crash...

comment:141 Changed 3 years ago by SimonKing

Argh. No, it did not fix the crash in my Gröbner basis application. Anyway, I still think the commit makes sense.

comment:142 Changed 3 years ago by git

  • Commit changed from 815d77c77f48605ea33b87a88120aaabadfb5f7f to 6dfb1cb1c6dc5704e2a8918cf2aeabdd8dbef69a

Branch pushed to git repo; I updated commit sha1. New commits:

6dfb1cbMake contains_biseq interruptible and add to its doc

comment:143 follow-up: Changed 3 years ago by vbraun

I'm getting three valgrind warnings (#15586):

sage: sage: from sage.misc.bounded_integer_sequences import _biseq_stresstest
sage: _biseq_stresstest()
==3937== Invalid read of size 8
==3937==    at 0x30A524B0: __gmpn_rshift (rshift.c:55)
==3937==    by 0x48CCB50B: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2611)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]
==3937==  Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd
==3937==    at 0x4C29BCF: malloc (vg_replace_malloc.c:296)
==3937==    by 0x142EC8FD: sage_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x142EC9E4: sage_gmp_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x30A439EE: __gmpz_init2 (init2.c:33)
==3937==    by 0x48CCE302: __pyx_f_4sage_4misc_25bounded_integer_sequences_concat_biseq (bounded_integer_sequences.c:2256)
==3937==    by 0x48CD6806: __pyx_pf_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_31__add__ (bounded_integer_sequences.c:7201)
==3937==    by 0x48CD6806: __pyx_pw_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_32__add__ (bounded_integer_sequences.c:7078)
==3937==    by 0x4E80BC8: binary_op1 (abstract.c:945)
==3937==    by 0x4E80BC8: PyNumber_Add (abstract.c:1185)
==3937==    by 0x48CD39AD: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8281)
==3937==    by 0x48CD39AD: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]
==3937== Conditional jump or move depends on uninitialised value(s)
==3937==    at 0x48CCB5C3: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2828)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
==3937==    by 0x4F3F24F: PyEval_EvalCodeEx (ceval.c:3265)
==3937==    by 0x4F3F378: PyEval_EvalCode (ceval.c:673)

The middle one is almost certainly an error, the other two don't look too good either.

comment:144 in reply to: ↑ 143 Changed 3 years ago by SimonKing

Good that I added the stress test!

Replying to vbraun:

I'm getting three valgrind warnings (#15586):

sage: sage: from sage.misc.bounded_integer_sequences import _biseq_stresstest
sage: _biseq_stresstest()
==3937== Invalid read of size 8
==3937==    at 0x30A524B0: __gmpn_rshift (rshift.c:55)
==3937==    by 0x48CCB50B: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2611)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]
==3937==  Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd
==3937==    at 0x4C29BCF: malloc (vg_replace_malloc.c:296)
==3937==    by 0x142EC8FD: sage_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x142EC9E4: sage_gmp_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x30A439EE: __gmpz_init2 (init2.c:33)
==3937==    by 0x48CCE302: __pyx_f_4sage_4misc_25bounded_integer_sequences_concat_biseq (bounded_integer_sequences.c:2256)
==3937==    by 0x48CD6806: __pyx_pf_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_31__add__ (bounded_integer_sequences.c:7201)
==3937==    by 0x48CD6806: __pyx_pw_4sage_4misc_25bounded_integer_sequences_22BoundedIntegerSequence_32__add__ (bounded_integer_sequences.c:7078)
==3937==    by 0x4E80BC8: binary_op1 (abstract.c:945)
==3937==    by 0x4E80BC8: PyNumber_Add (abstract.c:1185)
==3937==    by 0x48CD39AD: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8281)
==3937==    by 0x48CD39AD: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]

Do I understand correctly: It means that contains_biseq could read and concat_biseq could write stuff in non-allocated memory?

==3937== Conditional jump or move depends on uninitialised value(s)
==3937==    at 0x48CCB5C3: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2828)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
==3937==    by 0x4F3F24F: PyEval_EvalCodeEx (ceval.c:3265)
==3937==    by 0x4F3F378: PyEval_EvalCode (ceval.c:673)

The middle one is almost certainly an error, the other two don't look too good either.

Will the given line numbers (2611, 2256, 2828) be the same on my machine?

Last edited 3 years ago by SimonKing (previous) (diff)

comment:145 follow-up: Changed 3 years ago by vbraun

I think they are, but I wouldn't bet anything on it ;-)

comment:146 follow-up: Changed 3 years ago by vbraun

1) is contains_biseq reading into unallocated memory, and 2) says that the invalid read is just beyond what concat_biseq allocated.

comment:147 in reply to: ↑ 146 Changed 3 years ago by SimonKing

Replying to vbraun:

1) is contains_biseq reading into unallocated memory, and 2) says that the invalid read is just beyond what concat_biseq allocated.

Sigh. Could be that I even don't see where 1), 2) and 3) are in the valgrind log...

Concerning the invalid read of size 8 by __gmpn_rshift: I think I recall that at some point I wanted to avoid considering cases and thus allowed to rshift memory that partially is uninitialised, since after the shift I would only access the bytes that were allocated. Probably this is improper and should be fixed.

Concerning Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd: I thought this means I am writing into non-allocated memory. But by your new comment, I understand that this just indicates where the memory accessed by the previous item (__gmpn_rshift) was allocated. I guess I need an explanation.

Concerning Conditional jump or move depends on uninitialised value(s): Is this independent of the two previous points? Or does it concern the same example and the same memory location?

comment:148 follow-up: Changed 3 years ago by vbraun

With the updated valgrind they are the only errors while running the testsuite(!). Trivial to spot. Its possible that they are false positives if you use only certain bit positions, but zeroing out the memory doesn't cost much in terms of runtime and can save us a lot of work if your calculations about bit patterns are off by one ;-)

comment:149 in reply to: ↑ 148 Changed 3 years ago by SimonKing

Replying to vbraun:

With the updated valgrind they are the only errors while running the testsuite(!).

Wow, that's excellent news (except for my code :-/)!

Trivial to spot. Its possible that they are false positives if you use only certain bit positions, but zeroing out the memory doesn't cost much in terms of runtime and can save us a lot of work if your calculations about bit patterns are off by one ;-)

True. Anyway, to fix it I probably need answers to my questions from comment:147 in order to know exactly what to do.

comment:150 Changed 3 years ago by vbraun

In the first one you are reading past the end of the buffer from unallocated memory. Either you need to allocate enough (round up) or also handle the cases where you have less than a full 8 bytes. Even though its highly unlikely, reading past the end could even result in a segfault if the process space happens to end there.

comment:151 Changed 3 years ago by fredrik.johansson

If it's an "invalid read of size 8" and "0 bytes after a block" in mpn_copyi, chances are it's not a bug. GMP/MPIR will sometimes read 16 aligned bytes at a time when there are just 8 bytes left to copy, on certain architectures where this is safe. This might be the case in mpn_rshift too.

If you're absolutely sure that your code should be correct there, you could try compiling MPIR with assembly optimizations disabled and see if the error goes away. Or easier, temporarily replace the mpn_rshift in your code with a plain C implementation (e.g. the one in mpn/generic/rshift.c in MPIR).

The "Conditional jump or move depends on uninitialised value" is absolutely a bug.

comment:152 in reply to: ↑ 145 Changed 3 years ago by SimonKing

Replying to vbraun:

I think they (the line numbers, SK) are (the same on different machines, SK), but I wouldn't bet anything on it ;-)

Hmm. The conditional jump occurs in line 2828 of the c-file, but this is:

  /* "sage/misc/bounded_integer_sequences.pyx":398
 *             n += S1.itembitsize
 *     mpz_clear(tmp)
 *     return -1             # <<<<<<<<<<<<<<
 * 
 * cdef int index_biseq(biseq_t S, int item, size_t start) except -2:
 */   #### Here is 2828
  __pyx_r = -1;
  goto __pyx_L0;

Hm. Volker, could you provide the location of the conditional jump that is giving you the warning?

Last edited 3 years ago by SimonKing (previous) (diff)

comment:153 Changed 3 years ago by SimonKing

  • Status changed from needs_review to needs_work
  • Work issues set to Rework by using Sage's existing bitset code

I had a chat with Nathann today. He asked me about the data structure used by GMP, which is mainly long* (aka "limb"), plus some information on the number of limbs used and the number of limbs allocated. long* is also used by sage.misc.bitset.

What I am implementing here is in fact quite similar to bitsets (one item of a bounded integer sequence corresponds to a chunk of bits, but otherwise it is the same). So, he suggested to not duplicate code, but make mutual use.

I have looked at the bitset code in the past, but apparently it was long before I implemented bounded integer sequences: Now I realize that much of what I do here is actually done in sage.misc.bitset! So, definitely I should reuse the code. Hence, marking it as "needs work" for now, as the code will totally change.

Suggestion: I'll add to sage.misc.bitset what is not already there, and then sage.misc.bounded_integer_sequence will not provide all the boilerplate code, but only provide the Python layer to comfortably use bounded integer sequences.

comment:154 Changed 3 years ago by SimonKing

I do need a bit more for biseq_t than what is provided by bitset_t. For example, I need to store the information how many bits are required to store one item on the list. But anyway, I will now try to rewrite my code.

comment:155 Changed 3 years ago by git

  • Commit changed from 6dfb1cb1c6dc5704e2a8918cf2aeabdd8dbef69a to 47c639d55d3c07670b8b52a629f7cf86a8beb7ce

Branch pushed to git repo; I updated commit sha1. New commits:

bfd7898Merge branch 'develop' into t/15820/ticket/15820, since apparently the bitset code changed
47c639dRewrite bounded integer sequences using sage.misc.bitset

comment:156 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review
  • Work issues Rework by using Sage's existing bitset code deleted

By suggestion of Nathann, I am now reusing part of the existing code from sage.misc.bitset. Main advantage is the fact that (in contrast to the mpz_* functions) leading zeros will not be silently removed, which means that I could avoid many special cases.

Hopefully Nathann will find the new version of the code a lot less frightening than the old version...

Needs review again, and soonish I will update the timings that have occasionally been mentioned in comments above.

comment:157 Changed 3 years ago by SimonKing

Let us compare the performance tests from comment:83 (also adding tests for pickling) with the new code version (I am testing on the same laptop, thus, hopefully a comparison makes sense):

sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.15 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.35 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
1000000 loops, best of 3: 1.88 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 70.5 µs per loop

--> Became all faster

sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.43 µs per loop
sage: %timeit x = list(S1)
100000 loops, best of 3: 1.84 µs per loop
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.41 µs per loop
sage: %timeit x = list(S3)
1000 loops, best of 3: 304 µs per loop
sage: %timeit x = S0.list()
1000000 loops, best of 3: 666 ns per loop
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.2 µs per loop
sage: %timeit x = S2.list()
100000 loops, best of 3: 2.77 µs per loop
sage: %timeit x = S3.list()
10000 loops, best of 3: 112 µs per loop

--> Became faster at least for short sequences, which is the case I am mainly interested in

sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 1.49 µs per loop
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 1.52 µs per loop
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 1.52 µs per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 2.29 µs per loop

--> Became all faster

sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 2.5 µs per loop
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 62.2 µs per loop

--> Became faster for the shorter sequence

sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 340 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 339 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 339 ns per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 345 ns per loop

--> Became all faster

sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 210 ns per loop
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 211 ns per loop
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 368 ns per loop
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 200 ns per loop
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 199 ns per loop
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 542 ns per loop
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 219 ns per loop
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 205 ns per loop
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 2.33 µs per loop
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 963 ns per loop
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 201 ns per loop
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 2.36 ms per loop

--> All but the last became faster, which is excellent for my applications, as I am using subsequence containment tests fairly often.

Additional timings for pickling, comparing Python lists with bounded integer sequences:

sage: %timeit loads(dumps(L0))
10000 loops, best of 3: 39.5 µs per loop
sage: %timeit loads(dumps(L1))
10000 loops, best of 3: 43.9 µs per loop
sage: %timeit loads(dumps(L2))
10000 loops, best of 3: 65.5 µs per loop
sage: %timeit loads(dumps(L3))
100 loops, best of 3: 2.13 ms per loop
sage: %timeit loads(dumps(S0))
10000 loops, best of 3: 68.7 µs per loop
sage: %timeit loads(dumps(S1))
10000 loops, best of 3: 72.1 µs per loop
sage: %timeit loads(dumps(S2))
10000 loops, best of 3: 87 µs per loop
sage: %timeit loads(dumps(S3))
1000 loops, best of 3: 1.06 ms per loop

--> Pickling of short bounded integer sequences has an overhead, but longer sequences are pickled faster, due to the more compact representation.

I think that one can say that the new code version results in an overall performance gain, and I suppose it is less frightening now.

Last edited 3 years ago by SimonKing (previous) (diff)

comment:158 Changed 3 years ago by git

  • Commit changed from 47c639d55d3c07670b8b52a629f7cf86a8beb7ce to e3260e2d9f5bf0483770cc5235c00cd24c4eb310

Branch pushed to git repo; I updated commit sha1. New commits:

e3260e2Typographical improvements

comment:159 Changed 3 years ago by git

  • Commit changed from e3260e2d9f5bf0483770cc5235c00cd24c4eb310 to aac3a049805ab29aea8a0d3439c1eadf6605eab1

Branch pushed to git repo; I updated commit sha1. New commits:

aac3a04Fix cornercase in unpickling

comment:160 Changed 3 years ago by git

  • Commit changed from aac3a049805ab29aea8a0d3439c1eadf6605eab1 to b182ba27bdaeb0f066daf1b575db3588b3195f89

Branch pushed to git repo; I updated commit sha1. New commits:

b182ba2Use mpn_l/rshift only with nonzero shift

comment:161 follow-up: Changed 3 years ago by jdemeyer

Just randomly looking through this patch...

I don't like to put this in src/sage/misc. There is already too much stuff there. I think we need an additional top-level directory for basic data structures like this (bitset also should be in there). What do you think of src/sage/data_structures?

comment:162 follow-up: Changed 3 years ago by jdemeyer

Also, instead of allocate_biseq(), I would use biseq_init() to be more analogous to mpz_init() and bitset_init(). More generally, replace foo_biseq() by biseq_foo().

comment:163 follow-up: Changed 3 years ago by jdemeyer

Please don't cdef extern from "gmp.h", use from sage.libs.gmp.mpn cimport * or something.

comment:164 in reply to: ↑ 161 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Just randomly looking through this patch...

I don't like to put this in src/sage/misc. There is already too much stuff there. I think we need an additional top-level directory for basic data structures like this (bitset also should be in there). What do you think of src/sage/data_structures?

I think this question has been discussed on another ticket that I authored. My impression from the discussion was that "structural" stuff should either go to sage/structure or sage/misc. It should be the former, if it is about structures that only make sense in Sage (Parents, elements, ...), whereas sage/misc is for structure that would also make sense without Sage. I thought that bounded integer sequences belong to the latter, and I would also think that TripleDict and MonoDict should better be in sage/misc (the only reason for them being in sage/structure is the fact that they were introduced for coercion, which is Sage specific).

So, I think sage/misc is a good place. But I wouldn't mind to introduce a third structural module, sage/data_structures. On a different ticket?

comment:165 in reply to: ↑ 162 Changed 3 years ago by SimonKing

Replying to jdemeyer:

Also, instead of allocate_biseq(), I would use biseq_init() to be more analogous to mpz_init() and bitset_init(). More generally, replace foo_biseq() by biseq_foo().

Ok, makes sense.

comment:166 in reply to: ↑ 163 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Please don't cdef extern from "gmp.h", use from sage.libs.gmp.mpn cimport * or something.

Is it available from there? If it is, then I should certainly do it in that way.

comment:167 in reply to: ↑ 166 ; follow-up: Changed 3 years ago by SimonKing

Replying to SimonKing:

Replying to jdemeyer:

Please don't cdef extern from "gmp.h", use from sage.libs.gmp.mpn cimport * or something.

Is it available from there? If it is, then I should certainly do it in that way.

mp_bits_per_limb is not available from there, hence, I still import it from gmp.h.

For now I will not introduce a new sage.data_structures.

After running tests, I will push a commit that address your other comments.

comment:168 Changed 3 years ago by git

  • Commit changed from b182ba27bdaeb0f066daf1b575db3588b3195f89 to 1f5a53e7b23e406c553d789f83604cd4a7f90655

Branch pushed to git repo; I updated commit sha1. New commits:

1f5a53eChange the naming schemes of functions according to conventions in gmp and bitset

comment:169 in reply to: ↑ 164 ; follow-ups: Changed 3 years ago by jdemeyer

Replying to SimonKing:

On a different ticket?

It makes no sense to first introduce a new module on this ticket and then make a new ticket to immediately move that new module somewhere else (requiring changing import statements and everything). If it were an existing module, things would be different.

The problem with src/sage/misc is that it contains a lot of various things. Usually "misc" is used when something doesn't belong to any other category. But it's clear that there are a lot of modules defining data structures in misc (but also other stuff like temporary_file). I think it makes sense to group these in src/sage/data_structures (or another name, whatever) instead of dumping everything in misc.

comment:170 in reply to: ↑ 167 Changed 3 years ago by jdemeyer

Replying to SimonKing:

mp_bits_per_limb is not available from there, hence, I still import it from gmp.h.

No, do cimport it from sage.libs.gmp.types and add the declaration there.

comment:171 Changed 3 years ago by jdemeyer

This documentation of ctypedef struct biseq should be expanded. When reading it, it is really not clear to me what the various fields mean (especially "bla&mask_item" greps onethe item bla starts with makes no sense) and how the data structure works. If you're trying too hard to get everything one line, use

ctypedef struct biseq:
    # comment about data
    bitset_t data

    # comment about itembitsize
    # comment about itembitsize
    # comment about itembitsize
    mp_bitcnt_t itembitsize
...
Last edited 3 years ago by jdemeyer (previous) (diff)

comment:172 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:173 Changed 3 years ago by jdemeyer

first_bits_equal -> biseq_first_bits_equal

comment:174 follow-up: Changed 3 years ago by jdemeyer

Also: I think you use int in a lot of places where you really want a long or mp_limb_t or mp_size_t or Py_ssize_t.

Last edited 3 years ago by jdemeyer (previous) (diff)

comment:175 Changed 3 years ago by jdemeyer

list_to_biseq seems to leak memory if the biseq_t was already initialized. I don't mind, but it should be documented, since it's not obvious without reading the code.

Also in list_to_biseq, the branch if ln>mp_bits_per_limb: leaks tmp.

comment:176 in reply to: ↑ 169 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

On a different ticket?

It makes no sense to first introduce a new module on this ticket and then make a new ticket to immediately move that new module somewhere else

Except that the new ticket is not just about moving sage.misc.bounded_integer_sequences to sage.data_structures.bounded_integer_sequences, but moves a whole lot more.

On THIS ticket, it does not make sense to dooe the big move from sage.misc to sage.data_structures, since this ticket is about introducing bounded integer sequences

comment:177 in reply to: ↑ 169 Changed 3 years ago by nthiery

Replying to jdemeyer:

The problem with src/sage/misc is that it contains a lot of various things. Usually "misc" is used when something doesn't belong to any other category.

Yeah, and most of the time this is stuff that does not really belong to Sage in the first place, and should eventually be moved to upstream libraries. In some sense, it can be seen as a feature that it looks wrong there: it gives an incent for developpers to finally take on the task of actually moving the stuff upstream.

comment:178 in reply to: ↑ 174 Changed 3 years ago by SimonKing

Replying to jdemeyer:

Also: I think you use int in a lot of places where you really want a long or mp_limb_t or mp_size_t or Py_ssize_t.

I tried to be consistent. However, I do not find documentation: What is the purpose of mp_size_t versus mp_bitcnt_t?

Last edited 3 years ago by jdemeyer (previous) (diff)

comment:179 in reply to: ↑ 176 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

On THIS ticket, it does not make sense to do the big move from sage.misc to sage.data_structures

Of course not. But I meant to add the new module bounded_integer_sequences to sage.data_structures (the result being that this would be the only module inside sage.data_structures). Putting it in the correct place immediately is better than first putting it somewhere else and then moving it.

comment:180 Changed 3 years ago by SimonKing

Strangely, in gmp.h, I see that mp_size_t is long long int, hence, it is a signed type, in contrast to Python's size_t.

comment:181 in reply to: ↑ 179 Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

On THIS ticket, it does not make sense to do the big move from sage.misc to sage.data_structures

Of course not. But I meant to add the new module bounded_integer_sequences to sage.data_structures (the result being that this would be the only module inside sage.data_structures).

OK. This I could do.

comment:182 follow-up: Changed 3 years ago by jdemeyer

Replying to jdemeyer:

I tried to be consistent.

The problem is that mp_bits_per_limb is about the number of bits in mp_limb_t, not the number of bits in int. And using int for indices is inconsistent with Python which uses Py_ssize_t (which should be the same as mp_size_t).

comment:183 follow-up: Changed 3 years ago by jdemeyer

I would say: use mp_size_t for indices, mp_limb_t for the actual integers inside your sequences.

comment:184 in reply to: ↑ 182 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to jdemeyer:

I tried to be consistent.

The problem is that mp_bits_per_limb is about the number of bits in mp_limb_t, not the number of bits in int. And using int for indices is inconsistent with Python which uses Py_ssize_t (which should be the same as mp_size_t).

OK. So, I should use mp_size_t for indices. But what should I use mp_bitcnt_t for?

comment:185 in reply to: ↑ 183 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

I would say: use mp_size_t for indices, mp_limb_t for the actual integers inside your sequences.

If I am not mistaken, I always use mp_limb_t for the limbs. If you find a spot where this is not the case, please tell me.

comment:186 in reply to: ↑ 184 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

But what should I use mp_bitcnt_t for?

Any time you need a count of bits (which is always positive!)...

comment:187 in reply to: ↑ 185 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

If you find a spot where this is not the case, please tell me.

cdef int biseq_getitem(biseq_t S, mp_size_t index) except -1
     ^^^

comment:188 Changed 3 years ago by jdemeyer

I would say: when in doubt between mp_size_t and mp_bitcnt_t, use mp_size_t.

When in doubt between int and anything else: use the anything else.

comment:189 in reply to: ↑ 186 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

But what should I use mp_bitcnt_t for?

Any time you need a count of bits (which is always positive!)...

You mean, in particular, that I should *not* use it for an index? Hence, if I want to refer to bit number b in limb number l, then both b and l should be mp_size_t? Currently, I try to consistently use mp_bitcnt_t for b and mp_size_t for l.

Last edited 3 years ago by SimonKing (previous) (diff)

comment:190 in reply to: ↑ 187 ; follow-ups: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

If you find a spot where this is not the case, please tell me.

cdef int biseq_getitem(biseq_t S, mp_size_t index) except -1
     ^^^

Rationale: This function does not return a limb, but only one item (and a limb may comprise many items). To keep them apart, I chose a different type. Moreover, I consider mp_limb_t as a type that belongs to gmp. But the return type of that function should belong to the "Python world".

comment:191 Changed 3 years ago by mguaypaq

  • Cc mguaypaq removed

comment:192 in reply to: ↑ 190 Changed 3 years ago by jdemeyer

Replying to SimonKing:

I consider mp_limb_t as a type that belongs to gmp. But the return type of that function should belong to the "Python world".

int doesn't belong to the Python world either (don't confuse Cython int with Python int!)

comment:193 in reply to: ↑ 189 Changed 3 years ago by jdemeyer

Replying to SimonKing:

Replying to jdemeyer:

Replying to SimonKing:

But what should I use mp_bitcnt_t for?

Any time you need a count of bits (which is always positive!)...

You mean, in particular, that I should *not* use it for an index? Hence, if I want to refer to bit number b in limb number l, then both b and l should be mp_size_t? Currently, I try to consistently use mp_bitcnt_t for b and mp_size_t for l.

Using mp_bitcnt_t for this is fine, when I said "index" I meant index like for Python __getitem__

comment:194 in reply to: ↑ 190 Changed 3 years ago by jdemeyer

Replying to SimonKing:

To keep them apart, I chose a different type.

If that's the main motivation, define a new type:

ctypedef mp_limb_t biseq_item_t

comment:195 Changed 3 years ago by jdemeyer

Also, if bound is of type mp_limb_t, then the following check is not needed at all:

mpz_init_set_ui(tmp, bound-1)
ln = mpz_sizeinbase(tmp, 2)
if ln>mp_bits_per_limb:
    raise ValueError("The integer bound {} does not fit into {}".format(bound, mp_bits_per_limb))

comment:196 follow-up: Changed 3 years ago by SimonKing

One more reason for using "int" as return type: If I use mp_limb_t, then 6 is printed as 6L.

comment:197 in reply to: ↑ 196 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

One more reason for using "int" as return type: If I use mp_limb_t, then 6 is printed as 6L.

That's not one more reason to use int, it's one less reason to use mp_limb_t. With mp_limb_signed_t (which isn't declared but should be in src/sage/libs/gmp/types.pxd), you shouldn't have this problem.

comment:198 Changed 3 years ago by jdemeyer

But with a better specification of the biseq struct (see 171) I might give a better recommendation about which types to use.

comment:199 in reply to: ↑ 197 Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

One more reason for using "int" as return type: If I use mp_limb_t, then 6 is printed as 6L.

That's not one more reason to use int, it's one less reason to use mp_limb_t. With mp_limb_signed_t (which isn't declared but should be in src/sage/libs/gmp/types.pxd), you shouldn't have this problem.

There may be a problem with either int or mp_limb_signed_t: If the bound for the integer sequence is very large, then a single item would fill a whole mp_limb_t. However, this would not fit into a signed int/limb.

So, at the moment, it seems to me that ctypedef mp_limb_t biseq_item_t is preferred. Since the __repr__ method is not time-critical, I suggest to do the transformation from mp_limb_t to a nicely printed type there. I would accept that __getitem__ returns something that is not so nicely printed.

comment:200 Changed 3 years ago by git

  • Commit changed from 1f5a53e7b23e406c553d789f83604cd4a7f90655 to ef69d1e6eb30a9551bb45d1e91f6180454d78b86

Branch pushed to git repo; I updated commit sha1. New commits:

ef69d1eCreate sage.data_structures, move biseq into it, and amend types in biseq

comment:201 Changed 3 years ago by git

  • Commit changed from ef69d1e6eb30a9551bb45d1e91f6180454d78b86 to 83be138062f619a237e86650807c44b38eec94e5

Branch pushed to git repo; I updated commit sha1. New commits:

83be138Reword documentation of biseq_s

comment:202 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

I hope my preceding comments do address your concerns, Jeroen. One more thing: Bitset has cdef struct bitset_s and ctypedef bitset_s bitset_t[1]. To be consistent with it, I changed biseq into biseq_s, so that I now have cdef struct biseq_s and ctypedef biseq_s biseq_t[1].

comment:203 Changed 3 years ago by jdemeyer

In the "documentation" of biseq_s, this sentence is still way too vague to be useful: Use bitsets to store the data, which in turn is based on GMP integers. Like: what is "the data" and how is it stored in the bitset?

And it's not really true that bitset is based on GMP integers. It's just an array of limbs but we use GMP/MPIR mpn_ functions for some manipulations.

comment:204 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:205 Changed 3 years ago by jdemeyer

Are you using anything from NTL? If not, why the

include "sage/libs/ntl/decl.pxi"

comment:206 Changed 3 years ago by git

  • Commit changed from 83be138062f619a237e86650807c44b38eec94e5 to 1fb819cb133e2b94bfc9222585b8762ee5efad21

Branch pushed to git repo; I updated commit sha1. New commits:

1fb819cRemove a needless cimport, clarify documentation of biseq_s

comment:207 follow-up: Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

The import from ntl has indeed been useless.

Is the description of biseq_s clear enough now?

comment:208 in reply to: ↑ 207 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

Is the description of biseq_s clear enough now?

Add that the items in the sequence are non-negative integers and that itembitsize is in [1 .. GMP_LIMB_BITS] (assuming both these statements are true).

And I guess that mask_item equals limb_lower_bits_up(itembitsize), where limb_lower_bits_up is the function defined in src/sage/misc/bitset.pxi.

comment:209 Changed 3 years ago by git

  • Commit changed from 1fb819cb133e2b94bfc9222585b8762ee5efad21 to e166d38d8be72453d42325c5f6a73592ddeb3827

Branch pushed to git repo; I updated commit sha1. New commits:

e166d38Further clarification of biseq_s doc.

comment:210 in reply to: ↑ 208 Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Is the description of biseq_s clear enough now?

Add that the items in the sequence are non-negative integers and that itembitsize is in [1 .. GMP_LIMB_BITS] (assuming both these statements are true).

Done.

And I guess that mask_item equals limb_lower_bits_up(itembitsize), where limb_lower_bits_up is the function defined in src/sage/misc/bitset.pxi.

It could be that you just spotted a bug (corner case). I define it as ((<mp_limb_t>1)<<itembitsize)-1, which is limb_lower_bits_down. And that's to say: If the itembitsize is mp_bits_per_limb (which should be allowed), then the mask is 0, but it should be all bits 1.

Trying to construct a test to see if it really is a bug...


New commits:

e166d38Further clarification of biseq_s doc.

comment:211 Changed 3 years ago by SimonKing

Yes, it is a bug:

sage: B = BoundedIntegerSequence(2^32-1, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10])
sage: B
<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>

Thank you for spotting it!

comment:212 follow-up: Changed 3 years ago by jdemeyer

Hang on, I'm working on a reviewer patch.

comment:213 Changed 3 years ago by jdemeyer

Question: what is the reason that you allow a length of 0, requiring lots of special cases in the code? I.e. why not replace

    if S.length:
        if R.length:
            bitset_realloc(R.data, S.data.size)
        else:
            bitset_init(R.data, S.data.size)
        mpn_copyi(R.data.bits, S.data.bits, S.data.limbs)
    elif R.length:
        bitset_free(R.data)

by

    bitset_realloc(R.data, S.data.size)
    mpn_copyi(R.data.bits, S.data.bits, S.data.limbs)

comment:214 follow-up: Changed 3 years ago by jdemeyer

I see, bitset doesn't allow a length of 0. Perhaps this (artificial?) limitation on bitsets should be removed first...

comment:215 Changed 3 years ago by git

  • Commit changed from e166d38d8be72453d42325c5f6a73592ddeb3827 to 3d293b96fa5a584684efbaae8fe667b885618797

Branch pushed to git repo; I updated commit sha1. New commits:

3d293b9Fixing a corner case

comment:216 in reply to: ↑ 212 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Hang on, I'm working on a reviewer patch.

Sorry, I pushed already.

comment:217 follow-up: Changed 3 years ago by jdemeyer

...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.

comment:218 in reply to: ↑ 214 Changed 3 years ago by SimonKing

Replying to jdemeyer:

I see, bitset doesn't allow a length of 0.

Exactly. I do need bounded integer sequences of length zero, but then I can not initialise the corresponding bitset (since it is required to have positive length). Therefore the special cases.

comment:219 in reply to: ↑ 216 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work

Replying to SimonKing:

Sorry, I pushed already.

No problem, it's just a small fix. Can you leave the branch alone for a while now?

comment:220 in reply to: ↑ 217 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.

I am not sure if I want this, since in some cases (at least in my applications) I ask self.data.size for the bitlength of the sequence, and thus I would need other special cases there.

comment:221 Changed 3 years ago by SimonKing

OTOH, I just see that in theses cases (e.g., concatenation of biseq) I already have special cases for length 0. So, it would not be a loss to re-organise the special casing...

comment:222 in reply to: ↑ 220 Changed 3 years ago by jdemeyer

Replying to SimonKing:

Replying to jdemeyer:

...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.

I am not sure if I want this, since in some cases (at least in my applications) I ask self.data.size for the bitlength of the sequence, and thus I would need other special cases there.

I would prefer not to assume that data.size == length * itembitsize. For example, it might turn out to be more efficient to store some extra bits (1 extra limb for example).

comment:223 Changed 3 years ago by jdemeyer

I will rename list_to_biseq -> biseq_init_list if you don't mind.

comment:224 Changed 3 years ago by jdemeyer

I would also prefer to remove biseq_copy since it's a dangerous function: it assumes that the biseq has been allocated but it throws away the current contents.

Better to split this operation in two: changing biseq_copy to biseq_init_copy assuming it was not allocated. If you really want something like biseq_copy, just use biseq_dealloc and biseq_init_copy.

comment:225 Changed 3 years ago by jdemeyer

  • Dependencies set to #17195

comment:226 follow-up: Changed 3 years ago by SimonKing

Why should there be a dependency on cython upgrade?

comment:227 Changed 3 years ago by jdemeyer

  • Dependencies changed from #17195 to #17195, #17196

comment:228 in reply to: ↑ 226 Changed 3 years ago by jdemeyer

Replying to SimonKing:

Why should there be a dependency on cython upgrade?

Only the newest Cython has declarations for the PySlice API, so you can use PySlice_GetIndicesEx() easily.

comment:229 follow-up: Changed 3 years ago by SimonKing

  • Dependencies changed from #17195, #17196 to #17195

As much as I know, I don't use the bitset functions for which there is a requirement of equality of size. Hence, I don't think #17196 is a dependency.

comment:230 in reply to: ↑ 229 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

As much as I know, I don't use the bitset functions for which there is a requirement of equality of size.

Yes, you don't use them but you should use those functions. It will simplify a lot of things.

comment:231 in reply to: ↑ 230 ; follow-up: Changed 3 years ago by jdemeyer

Replying to jdemeyer:

Yes, you don't use them but you should use those functions. It will simplify a lot of things.

For example, biseq_concat is broken in the case where (S1.data.size)%mp_bits_per_limb + (S2.data.size)%mp_bits_per_limb > mp_bits_per_limb.

comment:232 in reply to: ↑ 231 Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to jdemeyer:

Yes, you don't use them but you should use those functions. It will simplify a lot of things.

For example, biseq_concat is broken in the case where (S1.data.size)%mp_bits_per_limb + (S2.data.size)%mp_bits_per_limb > mp_bits_per_limb.

Really? OK, I'll try to find an example that exposes the problem.

comment:233 Changed 3 years ago by SimonKing

sage: B = BoundedIntegerSequence(2^30-1, [1])
sage: B
<1>
sage: B+B
<1, 1>

Where's the problem?

comment:234 Changed 3 years ago by jdemeyer

Could you try again with a larger value of 1?

comment:235 Changed 3 years ago by SimonKing

Here it is:

sage: B = BoundedIntegerSequence(2^30-1, [2^30-1])
sage: B
<1073741823>
sage: B+B
<1073741823, 3>

OK, this requires a fix.

comment:236 follow-up: Changed 3 years ago by SimonKing

Do I understand correctly: At #17196, shift and comparison operations should be introduced that behave well wrt bitsizes, and then these operations should replace the mpn_... that I am using here.

comment:237 in reply to: ↑ 236 Changed 3 years ago by jdemeyer

  • Dependencies changed from #17195 to #17195, #17196

Replying to SimonKing:

Do I understand correctly: At #17196, shift and comparison operations should be introduced that behave well wrt bitsizes, and then these operations should replace the mpn_... that I am using here.

Yes.

comment:238 Changed 3 years ago by jdemeyer

  • Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
  • Modified changed from 10/22/14 14:14:46 to 10/22/14 14:14:46

comment:239 Changed 3 years ago by jdemeyer

  • Commit changed from 3d293b96fa5a584684efbaae8fe667b885618797 to 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8

In biseq_init_list(), the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be a feature?


New commits:

8c69dafUpgrade Cython to 0.21.1
88e5ecaMerge branch 'ticket/17195' into ticket/15820

comment:240 Changed 3 years ago by jdemeyer

Last commit is totally work in progress.

comment:241 Changed 3 years ago by git

  • Commit changed from 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8 to ab2f0c2bf3b68e4e979650f5554d529e5b7ea625

Branch pushed to git repo; I updated commit sha1. New commits:

ab2f0c2Various reviewer fixes

comment:242 Changed 3 years ago by jdemeyer

New commits:

ab2f0c2Various reviewer fixes

comment:243 Changed 3 years ago by git

  • Commit changed from ab2f0c2bf3b68e4e979650f5554d529e5b7ea625 to 815a40d9a2939ebe6e3e127847c04a8c5d7eaf9a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

83f8a56Relax assumptions for bitset functions
3ae75a1Move bitset to sage.data_structures
7a95511Merge branch 'ticket/17196' into HEAD
815a40dVarious reviewer fixes

comment:244 follow-up: Changed 3 years ago by SimonKing

What exactly does cython.overflowcheck do? Check whether the given arguments fit into the prescribed data types without truncation?

Last edited 3 years ago by SimonKing (previous) (diff)

comment:245 in reply to: ↑ 244 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

What exactly does cython.overflowcheck do?

See http://docs.cython.org/src/reference/compilation.html: for this application, it checks that the multiplication totalbitsize = l * itemsize doesn't overflow.

comment:246 Changed 3 years ago by git

  • Commit changed from 815a40d9a2939ebe6e3e127847c04a8c5d7eaf9a to 8d16a61d2874a38e10233cdf434db3854bd5ee0d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8d16a61Lots of fixes for bounded integer sequences

comment:247 Changed 3 years ago by git

  • Commit changed from 8d16a61d2874a38e10233cdf434db3854bd5ee0d to 8d59d141b7090fb6e5f6529b968e224ec0e5f73b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

8d59d14Lots of fixes for bounded integer sequences

comment:248 Changed 3 years ago by jdemeyer

  • Authors changed from Simon King to Simon King, Jeroen Demeyer

comment:249 in reply to: ↑ 245 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

What exactly does cython.overflowcheck do?

See http://docs.cython.org/src/reference/compilation.html: for this application, it checks that the multiplication totalbitsize = l * itemsize doesn't overflow.

Aha! So, it does not just check an overflow of the arguments, but it checks overflows happening in the innards of the function. Cool!

comment:250 Changed 3 years ago by git

  • Commit changed from 8d59d141b7090fb6e5f6529b968e224ec0e5f73b to d22c1ca23727402e376c7f772c2f51b8ed6af1f1

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

23762a3Import data_structures into global namespace
4aafd2fMerge branch 'ticket/17196' into HEAD
d22c1caLots of fixes for bounded integer sequences

comment:251 in reply to: ↑ 249 Changed 3 years ago by jdemeyer

Replying to SimonKing:

So, it does not just check an overflow of the arguments

In fact, it does not check overflow of the arguments.

Overflow checking of the arguments should happen in calling the function, not in the function itself.

Also note that Cython always does overflow checking when converting from Python types to C types, but not by default between C types.

comment:252 follow-up: Changed 3 years ago by SimonKing

Is sagemath.org down? I can not build your branch, since downloading the new cython package fails.


New commits:

23762a3Import data_structures into global namespace
4aafd2fMerge branch 'ticket/17196' into HEAD
d22c1caLots of fixes for bounded integer sequences

New commits:

23762a3Import data_structures into global namespace
4aafd2fMerge branch 'ticket/17196' into HEAD
d22c1caLots of fixes for bounded integer sequences

comment:253 in reply to: ↑ 252 Changed 3 years ago by jdemeyer

Replying to SimonKing:

Is sagemath.org down? I can not build your branch, since downloading the new cython package fails.

The new Cython package isn't yet merged, so you have to manually follow the instructions at #17195 (don't forget the renaming!)

comment:254 follow-up: Changed 3 years ago by jdemeyer

My intention for this ticket is to implement an additional function biseq_setitem() to set one item and then implement the biseq functions using either biseq_getitem()/biseq_setitem() or bitset functions. In cases where this cannot be done, add the required primitives and use those. The end result will hopefully be a lot less code without loss of efficiency.

comment:255 follow-up: Changed 3 years ago by jdemeyer

In case that you missed it, I am copying this question I asked before:

In biseq_init_list(), the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?

comment:256 in reply to: ↑ 254 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

My intention for this ticket is to implement an additional function biseq_setitem() to set one item and then implement the biseq functions using either biseq_getitem()/biseq_setitem() or bitset functions. In cases where this cannot be done, add the required primitives and use those. The end result will hopefully be a lot less code without loss of efficiency.

Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?

comment:257 in reply to: ↑ 256 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?

I don't quite understand what you refer to. Which function do you think would become slower?

comment:258 in reply to: ↑ 255 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

In case that you missed it, I am copying this question I asked before:

In biseq_init_list(), the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?

What do you mean by "randomly truncated"? The given integers are replaced by their remainder modulo the "mangled" bound (i.e., the smallest power of two that is greater or equal the given bound). It is documented, and it isn't random.

comment:259 in reply to: ↑ 258 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

Replying to jdemeyer:

In case that you missed it, I am copying this question I asked before:

In biseq_init_list(), the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?

What do you mean by "randomly truncated"? The given integers are replaced by their remainder modulo the "mangled" bound (i.e., the smallest power of two that is greater or equal the given bound). It is documented, and it isn't random.

OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?

comment:260 in reply to: ↑ 257 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Replying to SimonKing:

Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?

I don't quite understand what you refer to. Which function do you think would become slower?

BoundedIntegerSequence.__iter__ and biseq_to_list would probably become slower. Currently, one limb is shifted with mpn_rshift, and then all items in this limb are extracted by >> in this single limb, before mpn_rshifting the next limb. If I understand correctly, you are planning to replace this by biseq_getitem, which means that you mpn_rshift a limb and extract the first item of it, then mpn_rshift the same limb again for the second item.

I suppose repeated use of mpn_rshift is slower than a single mpn_rshift followed by repeated single-limb->>.

comment:261 in reply to: ↑ 259 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?

You mean one should just raise an error? OK, that's another possibility.

comment:262 Changed 3 years ago by SimonKing

Why did you introduce

assert S1.itembitsize == S2.itembitsize

?

I think that generally it is not the job of boilerplate functions to do bound checking. If you want to do bound checking, then do so on the level of BoundedIntegerSequence, but not on the level of biseq_t.

comment:263 follow-up: Changed 3 years ago by SimonKing

Perhaps we should consider to change the hash function for bitsets (see the comments for the hash of bounded integer sequences that you removed). But I suppose this should not be done on this ticket.

My summary:

  • We have different functions that assume that the given sequences have equivalent bounds. In most cases, the assumption is not checked, with exception of the one assert that you introduced. I vote for removing the assert.
  • I don't think iteration should use biseq_getitem to extract items one by one. Instead, the current idea ("shift enough data so that one limb is full and extract items from this single limb") should be preserved.
  • I like your change to biseq_getitem, where you avoid using mpn_rshift for just one/two limbs. I think the same change could be done in biseq_slice in the case step!=1, and also in iteration.

Other than that, your changes seem fine to me. I suggest to do the "replacement of one/two limb shifts", unless you beat me to it.

comment:264 Changed 3 years ago by SimonKing

  • Reviewers set to Simon King

comment:265 Changed 3 years ago by SimonKing

Concerning the assert: The only place in which biseq_init_concat is used does in fact check compatibility of bounds, and raises a ValueError if they are not compatible. Hence, the assert statement is redundant.

comment:266 Changed 3 years ago by SimonKing

  • Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820

comment:267 Changed 3 years ago by SimonKing

  • Commit changed from d22c1ca23727402e376c7f772c2f51b8ed6af1f1 to 41c40cf2c59734e7e665ba9c2350da2930c9cc8e
  • Status changed from needs_work to needs_review

From my perspective, it is good to go...


New commits:

41c40cfDon't use mpn_rshift for shifting two limbs

comment:268 in reply to: ↑ 263 Changed 3 years ago by jdemeyer

Replying to SimonKing:

  • We have different functions that assume that the given sequences have equivalent bounds. In most cases, the assumption is not checked, with exception of the one assert that you introduced. I vote for removing the assert.

OK then, no problem.

comment:269 follow-up: Changed 3 years ago by jdemeyer

Let's do some benchmarks:

  1. With the latest commit:
    sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
    sage: B = BoundedIntegerSequence(2, [1]*100000)
    sage: timeit('B.list()')
    625 loops, best of 3: 965 µs per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 1.73 ms per loop
    sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
    sage: timeit('B.list()')
    625 loops, best of 3: 1.1 ms per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 1.91 ms per loop
    
  1. With 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8:
    sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
    sage: B = BoundedIntegerSequence(2, [1]*100000)
    sage: timeit('B.list()')
    125 loops, best of 3: 3 ms per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 3.87 ms per loop
    sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
    sage: timeit('B.list()')
    125 loops, best of 3: 3.68 ms per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 4.47 ms per loop
    

comment:270 in reply to: ↑ 260 Changed 3 years ago by jdemeyer

Replying to SimonKing:

BoundedIntegerSequence.__iter__ and biseq_to_list would probably become slower. Currently, one limb is shifted with mpn_rshift, and then all items in this limb are extracted by >> in this single limb, before mpn_rshifting the next limb. If I understand correctly, you are planning to replace this by biseq_getitem, which means that you mpn_rshift a limb and extract the first item of it, then mpn_rshift the same limb again for the second item.

I suppose repeated use of mpn_rshift is slower than a single mpn_rshift followed by repeated single-limb->>.

As you can see, I implemented biseq_getitem() much faster than that, so I doubt there is still a slowdown for using repeated calls to biseq_getitem().

comment:271 in reply to: ↑ 261 Changed 3 years ago by jdemeyer

Replying to SimonKing:

Replying to jdemeyer:

OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?

You mean one should just raise an error? OK, that's another possibility.

Yes, that is what I mean.

comment:272 follow-up: Changed 3 years ago by SimonKing

I just notice that you removed biseq_to_list. Why? Is there no speed penalty for using iterator instead?

comment:273 in reply to: ↑ 272 Changed 3 years ago by jdemeyer

Replying to SimonKing:

I just notice that you removed biseq_to_list.

Just use [biseq_getitem_py(self.data, i) for i in range(self.data.length)]

comment:274 Changed 3 years ago by jdemeyer

Also, I think that biseq_to_list is something which belongs in the Python interface, not the C interface.

comment:275 in reply to: ↑ 269 Changed 3 years ago by SimonKing

I only see this comment after posting my previous comment.

Replying to jdemeyer:

Let's do some benchmarks:

  1. With the latest commit:
    sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
    sage: B = BoundedIntegerSequence(2, [1]*100000)
    sage: timeit('B.list()')
    625 loops, best of 3: 965 µs per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 1.73 ms per loop
    sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
    sage: timeit('B.list()')
    625 loops, best of 3: 1.1 ms per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 1.91 ms per loop
    
  1. With 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8:
    sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
    sage: B = BoundedIntegerSequence(2, [1]*100000)
    sage: timeit('B.list()')
    125 loops, best of 3: 3 ms per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 3.87 ms per loop
    sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
    sage: timeit('B.list()')
    125 loops, best of 3: 3.68 ms per loop
    sage: timeit('list(B)')
    125 loops, best of 3: 4.47 ms per loop
    

OK, excellent, then I think it is ok to drop biseq_to_list.

comment:276 Changed 3 years ago by SimonKing

If I understand correctly, the only remaining issue is whether we silently truncate the input, or raise an error if the given items are too large. I have no preference here, so, you may make it a reviewer patch.

comment:277 follow-up: Changed 3 years ago by SimonKing

Sigh. After merging this into #16453, I get several segmentation faults.

comment:278 in reply to: ↑ 277 Changed 3 years ago by SimonKing

Replying to SimonKing:

Sigh. After merging this into #16453, I get several segmentation faults.

But fortunately this ticket is not to blame for it :-)

comment:279 follow-up: Changed 3 years ago by jdemeyer

I am going to leave this ticket alone for at least a few days.

I do think there is still a lot of improvement, mainly simplification:

  1. Implement biseq_setitem() and use that in a lot of places.
  2. Implement mpn_equal_bits_shifted() to compare bit patterns efficiently for equality and use this for biseq_max_overlap and biseq_contains.

comment:280 Changed 3 years ago by jdemeyer

  • Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
  • Modified changed from 10/23/14 14:20:11 to 10/23/14 14:20:11

comment:281 Changed 3 years ago by jdemeyer

  • Commit changed from 41c40cf2c59734e7e665ba9c2350da2930c9cc8e to 9056cf636b65d30c5c89bc151c7740c1ea1143d3

New commits:

9056cf6Fix handling of bound in biseq_init_list

comment:282 in reply to: ↑ 279 Changed 3 years ago by SimonKing

Replying to jdemeyer:

  1. Implement biseq_setitem() and use that in a lot of places.

BoundedIntegerSequence was (originally) thought of as being immutable (it is hashable). Hence, so far, I only see application in biseq_init_list and biseq_slice. OK, that's something...

  1. Implement mpn_equal_bits_shifted() to compare bit patterns efficiently for equality and use this for biseq_max_overlap and biseq_contains.

+1.

comment:283 follow-up: Changed 3 years ago by SimonKing

Would it be acceptable that biseq_setitem(S,index,item) assumes that S[item] is zero before the assignment? This condition holds in the two applications I can think of.

Of course, I could additionally provide a function that sets a previously existing item to zero.

Last edited 3 years ago by SimonKing (previous) (diff)

comment:284 in reply to: ↑ 283 Changed 3 years ago by jdemeyer

Replying to SimonKing:

Would it be acceptable that biseq_setitem(S,index,item) assumes that S[item] is zero before the assignment?

Then I would call it biseq_inititem(S, index, item).

comment:285 Changed 3 years ago by SimonKing

  • Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820

comment:286 follow-up: Changed 3 years ago by jdemeyer

  • Commit changed from 9056cf636b65d30c5c89bc151c7740c1ea1143d3 to e9c779ae629dbec356bb8546e3974e2a67050051

It will be more efficient to declare biseq_inititem as cdef inline void.


New commits:

e9c779aSimplify code by using biseq_getitem/biseq_inititem

comment:287 Changed 3 years ago by jdemeyer

Concerning enumerate(): I have no idea if Cython optimizes this. If not, you better just manually keep track of index.

comment:288 follow-up: Changed 3 years ago by jdemeyer

Why did you change if item_limb > bound to if item > bound? The former is a C comparison, therefore faster.

comment:289 in reply to: ↑ 286 Changed 3 years ago by jdemeyer

Replying to jdemeyer:

It will be more efficient to declare biseq_inititem as cdef inline void.

More generally: if you have cdef or cpdef functions returning nothing and you don't require exception handling, then using void as return type is more efficient (without it, the functions return the Python value None).

comment:290 in reply to: ↑ 288 ; follow-up: Changed 3 years ago by SimonKing

Replying to jdemeyer:

Why did you change if item_limb > bound to if item > bound? The former is a C comparison, therefore faster.

Sure. But you did item_limb = item without testing whether item > bound, which means that item_limb could end up below the bound. Hence, you'd have the random truncation that you wanted to avoid.

comment:291 in reply to: ↑ 290 Changed 3 years ago by jdemeyer

Replying to SimonKing:

But you did item_limb = item without testing whether item > bound, which means that item_limb could end up below the bound.

Not true because the assignment item_limb = item is a Python -> C conversion which is always checked:

sage: BoundedIntegerSequence(100, [2^256])
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-6-ba7f864a5712> in <module>()
----> 1 BoundedIntegerSequence(Integer(100), [Integer(2)**Integer(256)])

/usr/local/src/sage-config/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.BoundedIntegerSequence.__init__ (build/cythonized/sage/data_structures/bounded_integer_sequences.c:7798)()

/usr/local/src/sage-config/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.biseq_init_list (build/cythonized/sage/data_structures/bounded_integer_sequences.c:5706)()

OverflowError: long int too large to convert

(perhaps add this as doctest)

comment:292 Changed 3 years ago by SimonKing

I repeated the timings from commit:157, but still with the item<=bound test (so, initialisation time might improve).

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.45 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
100000 loops, best of 3: 2.1 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
100000 loops, best of 3: 4.28 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
1000 loops, best of 3: 300 µs per loop

--> Became much slower

sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.44 µs per loop
sage: %timeit x = list(S1)
100000 loops, best of 3: 2.18 µs per loop
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.29 µs per loop
sage: %timeit x = list(S3)
1000 loops, best of 3: 286 µs per loop
sage: %timeit x = S0.list()
1000000 loops, best of 3: 697 ns per loop
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.2 µs per loop
sage: %timeit x = S2.list()
100000 loops, best of 3: 2.78 µs per loop
sage: %timeit x = S3.list()
10000 loops, best of 3: 135 µs per loop

--> not much conclusive

sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 815 ns per loop
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 846 ns per loop
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 845 ns per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 1.62 µs per loop
sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 1.55 µs per loop
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 49.8 µs per loop

--> Became much faster

sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 354 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 355 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 339 ns per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 346 ns per loop

--> Became perhaps slower.

What does that all mean?

For my applications not very much, actually, because what I care about is not initialisation from lists and direct item access and not slices with step different from one, but I do care about:

  • Testing for subsequence containment, comparison and hash
  • Slicing with step one
  • Concatenation.

So, I leave it up to you whether clarity of code is more important than the above slow-down...

Last edited 3 years ago by SimonKing (previous) (diff)

comment:293 follow-up: Changed 3 years ago by jdemeyer

If you do timings, please keep in mind 286.

comment:294 Changed 3 years ago by SimonKing

Right. When using item_limb, the timings for initialisation from lists become good:

sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.27 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.51 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
1000000 loops, best of 3: 1.88 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 65.5 µs per loop

comment:295 in reply to: ↑ 293 Changed 3 years ago by SimonKing

Replying to jdemeyer:

If you do timings, please keep in mind 286.

The timings are after using inline void.

comment:296 Changed 3 years ago by git

  • Commit changed from e9c779ae629dbec356bb8546e3974e2a67050051 to 7d258591bab37466eb6392443ed2e84cf699102c

Branch pushed to git repo; I updated commit sha1. New commits:

acf2a75Declare some functions as 'cdef inline void'
7d25859Speed-up for testing bounds

comment:297 follow-up: Changed 3 years ago by jdemeyer

I know it's really a detail, but I intentionally used repr(item) to display the error message

raise ValueError("list item %r larger than %s"%(item, bound) )

This way, the error message refers to the list item as given by the user, which might be some strange Python type which prints differently.

comment:298 Changed 3 years ago by jdemeyer

Something else: the implementation of biseq_getitem_py using PyInt_FromSize_t gives a good reason to have biseq_item_t == size_t. Because of this, I would propose to remove the biseq_item_t type again and replace it by an explicit size_t.

comment:299 Changed 3 years ago by git

  • Commit changed from 7d258591bab37466eb6392443ed2e84cf699102c to 8a10b734c7f81e44bb401c1fec2dc82bd0e98d5e

Branch pushed to git repo; I updated commit sha1. New commits:

8a10b73Use plain size_t, not biseq_item_t

comment:300 Changed 3 years ago by git

  • Commit changed from 8a10b734c7f81e44bb401c1fec2dc82bd0e98d5e to fa7cde09b93359371f236ce6da13bb8f5a947a08

Branch pushed to git repo; I updated commit sha1. New commits:

fa7cde0Use original data in error message

comment:301 Changed 3 years ago by git

  • Commit changed from fa7cde09b93359371f236ce6da13bb8f5a947a08 to 6e03f19d4bc2379e44e0b8adc32c13ac6e0bcabb

Branch pushed to git repo; I updated commit sha1. New commits:

6e03f19Code and speed improvement for biseq containment test

comment:302 Changed 3 years ago by SimonKing

I have implemented mpn_equal_bits_shifted. I do not use mpn_rshift followed be mpn_cmp, since this would mean to touch each limb twice (once for shifting, once for comparison), even if the difference occurs early in the sequence. Of course, using such function simplifies the code of biseq_contains and biseq_max_overlap considerably.

Here I redo the containment tests from comment:157, giving timings with the new code and (in brackets) with the previous code (I would not directly compare with comment:157, since the previous code differs too much from the code back at comment:157).

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 190 ns per loop  (384 ns per loop)
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 282 ns per loop  (519 ns per loop)
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 644 ns per loop  (2.1 µs per loop)
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 50.2 µs per loop    (2.36 ms per loop)

So, there is a very clear improvement, and this time it concerns a function that I use very extensively in my applications.

Jeroen, I think that with the new commit I addressed all of your concerns, hence, for now I will move on to the follow-up tickets (i.e., cythoned quiver paths and the non-commutative F5 algorithm that is not on trac yet).

comment:303 in reply to: ↑ 297 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work

(never mind)

Last edited 3 years ago by jdemeyer (previous) (diff)

comment:304 Changed 3 years ago by jdemeyer

  • Status changed from needs_work to needs_review

comment:305 Changed 3 years ago by jdemeyer

__iter__ and biseq_index should be implemented in terms of biseq_getitem() (or a good reason given why this is a bad idea).

Last edited 3 years ago by jdemeyer (previous) (diff)

comment:306 Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work

comment:307 Changed 3 years ago by jdemeyer

In biseq_init_list, iteration is about 10% faster the following way:

for item in data:
    item_limb = item
    index += 1

instead of

for index from 0<=index<R.length:
    item_limb = data[index]

comment:308 Changed 3 years ago by jdemeyer

biseq_clearitem must be added to the .pxd file

comment:309 follow-up: Changed 3 years ago by jdemeyer

In biseq_slice, the case step == 1 should also be implemented using bitset primitives.

comment:310 follow-up: Changed 3 years ago by jdemeyer

Should I make these changes or will you? I'm just asking to avoid duplicate work.

comment:311 in reply to: ↑ 310 Changed 3 years ago by SimonKing

Replying to jdemeyer:

Should I make these changes or will you? I'm just asking to avoid duplicate work.

I will try today.

comment:312 follow-up: Changed 3 years ago by SimonKing

While I am at it, I will also make biseq_max_overlap, biseq_index and biseq_contains return mp_size_t, not int. After all, they return an index in a bounded integer sequence (or -1 if not found, or -2 on error), and I think we have agreed that indices should be mp_size_t.

comment:313 follow-up: Changed 3 years ago by SimonKing

Currently, when calling B.index(i), i is compared modulo the B.bound() with the items of B. Taking i modulo the bound is currently done in biseq_index. I suggest to move this to B.index, since generally I want to avoid bound checking etc. in the boilerplate functions.

comment:314 in reply to: ↑ 312 Changed 3 years ago by jdemeyer

Replying to SimonKing:

While I am at it, I will also make biseq_max_overlap, biseq_index and biseq_contains return mp_size_t, not int. After all, they return an index in a bounded integer sequence (or -1 if not found, or -2 on error), and I think we have agreed that indices should be mp_size_t.

I hadn't noticed, but yes: that should certainly be changed.

comment:315 in reply to: ↑ 313 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

Currently, when calling B.index(i), i is compared modulo the B.bound() with the items of B. Taking i modulo the bound is currently done in biseq_index. I suggest to move this to B.index

I would remove this modulo operation completely. If the item is too large or too small, treat it as not found.

comment:316 follow-up: Changed 3 years ago by jdemeyer

In biseq_index, use size_t for item.

comment:317 in reply to: ↑ 316 Changed 3 years ago by SimonKing

Replying to jdemeyer:

In biseq_index, use size_t for item.

Since biseq_getitem returns a size_t, or is there another reason?

comment:318 in reply to: ↑ 315 Changed 3 years ago by SimonKing

Replying to jdemeyer:

I would remove this modulo operation completely. If the item is too large or too small, treat it as not found.

OK, but I'll do this test in BoundedIntegerSequence.index, not in biseq_index.

comment:319 Changed 3 years ago by SimonKing

Should I use biseq_getitem_py both in BoundedIntegerSequence.__iter__ and BoundedIntegerSequence.list? Or should one of them use biseq_getitem for efficiency?

comment:320 in reply to: ↑ 309 Changed 3 years ago by SimonKing

Replying to jdemeyer:

In biseq_slice, the case step == 1 should also be implemented using bitset primitives.

Problem: We would need a bitset primitive that shifts a part of a bitset S. Currently, bitset_rshift shifts all of S, not just a part of S.

I see four ways to proceed:

  • Write a new primitive bitset_rshift_subset, which means code duplication
  • Add an optional argument to bitset_rshift, which means that all places using this function must be changed.
  • Change bitset_rshift so that no assumption on the size of source and target are needed: We just shift the minimum of the number of limbs that is available in the source and of the number of limbs that fits into the target.
  • Do not use bitset primitives in biseq_slice.

comment:321 follow-up: Changed 3 years ago by SimonKing

PS: My favourite would be to modify bitset_rshift (and bitset_lshift) to make it robust against reading/writing out of bound, in the sense that bitset_rshift(R,S,n) moves a subset of S that fits into R.

Last edited 3 years ago by SimonKing (previous) (diff)

comment:322 Changed 3 years ago by git

  • Commit changed from 6e03f19d4bc2379e44e0b8adc32c13ac6e0bcabb to b5b066dc218d7fb6e5e13eb19cce4720d4a2d057

Branch pushed to git repo; I updated commit sha1. New commits:

b5b066dCode simplification for biseq_*, bound check for bitset shifts

comment:323 Changed 3 years ago by git

  • Commit changed from b5b066dc218d7fb6e5e13eb19cce4720d4a2d057 to 43f78ad50eee7259ab06d5eec1d4cdb03a43f50b

Branch pushed to git repo; I updated commit sha1. New commits:

43f78adAdding a reference to trac

comment:324 Changed 3 years ago by SimonKing

In the new commits I addressed (I hope) all issues that we discussed yesterday and today.

Timings without the new commits:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(27, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10])
sage: %timeit B.list()
1000000 loops, best of 3: 1.64 µs per loop
sage: %timeit list(B)
100000 loops, best of 3: 2.57 µs per loop
sage: %timeit B[2:-2]
1000000 loops, best of 3: 1.13 µs per loop
sage: L = [randint(0,7) for _ in range(1000)]
sage: %timeit B = BoundedIntegerSequence(8, L)
100000 loops, best of 3: 13 µs per loop
sage: B = BoundedIntegerSequence(8, L)
sage: %timeit B.list()
10000 loops, best of 3: 31.5 µs per loop
sage: %timeit list(B)
10000 loops, best of 3: 55 µs per loop
sage: %timeit B[3:800]
1000000 loops, best of 3: 1.26 µs per loop

And with the new commit:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(27, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10])
sage: %timeit B.list()
1000000 loops, best of 3: 1.6 µs per loop
sage: %timeit list(B)
100000 loops, best of 3: 2.62 µs per loop
sage: %timeit B[2:-2]
1000000 loops, best of 3: 1.12 µs per loop
sage: L = [randint(0,7) for _ in range(1000)]
sage: %timeit B = BoundedIntegerSequence(8, L)
100000 loops, best of 3: 11.5 µs per loop
sage: B = BoundedIntegerSequence(8, L)
sage: %timeit B.list()
10000 loops, best of 3: 30.8 µs per loop
sage: %timeit list(B)
10000 loops, best of 3: 64.9 µs per loop
sage: %timeit B[3:800]
1000000 loops, best of 3: 1.27 µs per loop

Apparently, creation from a list became faster, iteration over a long list became 10% slower (but in my appliations, I do not iterate over the items of a bounded integer sequence, hence, I don't care about the slow-down), and everything else did not change significantly.

And the code became clearer, which is a plus. I hope you'll find that my changes to bitset_rshift and bitset_lshift make sense.

comment:325 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

comment:326 in reply to: ↑ 321 Changed 3 years ago by jdemeyer

Given the new code, the comment

Items will not just be compared by congruence modulo the bound of the
sequence, but by equality.

no longer makes sense.

comment:327 Changed 3 years ago by jdemeyer

I would replace the doc

If the result of the shift would exceed the size of ``r``, then it
will be cut.

by something like

There are no assumptions on the sizes of ``a`` and ``r``.
Bits which are shifted outside of the resulting bitset are discarded.

comment:328 Changed 3 years ago by jdemeyer

In

raise ValueError("list item {} larger than {}".format(data[index], bound) )

replace data[index] by item.

comment:329 Changed 3 years ago by git

  • Commit changed from 43f78ad50eee7259ab06d5eec1d4cdb03a43f50b to 7a0dd469527c8a4f39d8d891703fcf5feecda9a1

Branch pushed to git repo; I updated commit sha1. New commits:

7a0dd46Some improvements of the doc

comment:330 Changed 3 years ago by SimonKing

Done!


New commits:

7a0dd46Some improvements of the doc

comment:331 Changed 3 years ago by SimonKing

  • Status changed from needs_review to needs_work

I found a bug (probably in slicing):

sage: B1 = BoundedIntegerSequence(8, [0,7])
sage: B2 = BoundedIntegerSequence(8, [2,1,4])
sage: B1[0:1]+B2
<0, 7, 1, 4>
sage: B1[0:1]
<0>

My diagnose is that the top bits of the slice (those exceeding the size) have not been cleared. This doesn't show when printing it, but concatenation does rely on the top bits being cleared.

comment:332 Changed 3 years ago by git

  • Commit changed from 7a0dd469527c8a4f39d8d891703fcf5feecda9a1 to 6723c7e44dbae3118e3dff259fd0e633915b53c9

Branch pushed to git repo; I updated commit sha1. New commits:

6723c7eFix highest bits of bitsets after fixing

comment:333 Changed 3 years ago by SimonKing

  • Status changed from needs_work to needs_review

Fixed and tested...


New commits:

6723c7eFix highest bits of bitsets after fixing

comment:334 Changed 3 years ago by jdemeyer

For consistency, biseq_slice should be renamed biseq_init_slice.

comment:335 Changed 3 years ago by jdemeyer

  • Reviewers changed from Simon King to Jeroen Demeyer, Simon King

Modulo this comment about biseq_init_slice, I think the public biseq interface looks good.

I have not looked at all the changes of the last few days and I don't feel like doing that now. I do plan to review this ticket in some time with a fresh look (of course, other reviewers are always welcome). In any case, I think the current branch works well enough that you can use it for your other tickets.

comment:336 Changed 3 years ago by git

  • Commit changed from 6723c7e44dbae3118e3dff259fd0e633915b53c9 to b6ef6d5c5faffe4059b1539f00e25dd98b9481c0

Branch pushed to git repo; I updated commit sha1. New commits:

b6ef6d5Rename biseq_slice -> biseq_init_slice

comment:337 Changed 3 years ago by jdemeyer

In biseq_init_slice(biseq_t R, biseq_t S, mp_size_t start, mp_size_t stop, int step), the step argument should also be mp_size_t.

comment:338 Changed 3 years ago by git

  • Commit changed from b6ef6d5c5faffe4059b1539f00e25dd98b9481c0 to 0e51c4a8ae1720e301b4d57ee51289de475bf51d

Branch pushed to git repo; I updated commit sha1. New commits:

0e51c4aChange the type of an argument

comment:339 Changed 3 years ago by jdemeyer

  • Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
  • Modified changed from 10/30/14 10:55:56 to 10/30/14 10:55:56

comment:340 Changed 3 years ago by jdemeyer

  • Commit changed from 0e51c4a8ae1720e301b4d57ee51289de475bf51d to d46a22d394c0c3556bd297427614bbc43140efb3

Resolved easy merge conflict.


New commits:

d46a22dMerge remote-tracking branch 'origin/develop' into ticket/15820

comment:341 Changed 3 years ago by SimonKing

Can the review be finished? #16453 and #17435 depend on it.

comment:342 Changed 3 years ago by SimonKing

PS: The errors that some patchbots report seem unrelated. And I don't see what the coverage script is complaining about. Coverage of the new stuff is 100%.

comment:343 Changed 3 years ago by git

  • Commit changed from d46a22d394c0c3556bd297427614bbc43140efb3 to 7e5f21754e6c056d71609830729c093ba09b56a6

Branch pushed to git repo; I updated commit sha1. New commits:

7e5f217Reorganize logic of bitset shifts

comment:344 Changed 3 years ago by git

  • Commit changed from 7e5f21754e6c056d71609830729c093ba09b56a6 to 0c646186f3522ab9bf3210ddc9bd3baf7b1b4664

Branch pushed to git repo; I updated commit sha1. New commits:

d0a5c78Move bitset_fix() function
0c64618Documentation fixes

comment:345 Changed 3 years ago by git

  • Commit changed from 0c646186f3522ab9bf3210ddc9bd3baf7b1b4664 to 58f5f57bd3d620b571ce2c8d1e8da372f5171fdf

Branch pushed to git repo; I updated commit sha1. New commits:

93e64fdAlways raise OverflowError if list item is out of bounds
Last edited 3 years ago by jdemeyer (previous) (diff)

comment:346 Changed 3 years ago by git

  • Commit changed from 58f5f57bd3d620b571ce2c8d1e8da372f5171fdf to 411fe4e04b87e5fdf87e4c3f4052e6747e09eca8

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

411fe4eSimplify/fix the logic of some biseq functions

comment:347 Changed 3 years ago by git

  • Commit changed from 411fe4e04b87e5fdf87e4c3f4052e6747e09eca8 to 8db9f6542a54ead67c1df3d069103b620e85eea6

Branch pushed to git repo; I updated commit sha1. New commits:

cf166a6Generalise biseq_max_overlap() and rename as biseq_reverse_contains()
8db9f65Various improvements to BoundedIntegerSequence

comment:348 Changed 3 years ago by jdemeyer

My review of this is done now. I'll let you look at my commits.

comment:349 Changed 3 years ago by jdemeyer

I have made one important functional change: you should change biseq_max_overlap(A, B) to biseq_reverse_contains(B, A, 1) in the follow-up tickets.

I also fixed a few bugs and changed some exceptions but there should be no other functional changes.

comment:350 follow-up: Changed 3 years ago by jdemeyer

  • Status changed from needs_review to needs_work
  • Work issues set to interrupt handling

We are still missing sig_* functions to allow interrupts. But I prefer for you to review my changes before adding those.

comment:351 in reply to: ↑ 350 ; follow-up: Changed 3 years ago by SimonKing

Dear Jeroen,

thank you for reviewing!

Replying to jdemeyer:

We are still missing sig_* functions to allow interrupts. But I prefer for you to review my changes before adding those.

  • You changed the location of the (old) function bitset_fix. Why? Won't that increase the likelihood of a merge conflict?
  • When reading the function name biseq_reverse_contains, I'd expect that it either checks for containment of a reverse subsequence, or it checks for containment of a subsequence starting at the end of the supersequence. Both is not the case. What this function determines is an overlap of the end of first and the beginning of the second argument. That said, I don't think the new function name is worse than the old.

The other changes seem at least reasonable, some are needed. Thanks for spotting it.

In any case, there will be a merge conflict with the next follow-up ticket. But I guess it can hardly be avoided. I think we can now take care of signal handling.

comment:352 Changed 3 years ago by SimonKing

And one thing is not correct in the description of biseq_reverse_contains:

  Return ``i`` such that the bounded integer sequence ``S1`` starts with
  the sequence ``S2[i:]``, where ``start <= i < S2.length``, or return
  ``-1`` if no such ``i`` exists.

If you apply the function (whatever it is named) to <2,3,2,3,1> and <1,2,3,2,3> (I just notice that you also exchanged the rôles of S1 and S2!), then I want it to yield the maximal overlap: It should return 1, not 3. That's not clear from the new description.

comment:353 Changed 3 years ago by SimonKing

Hmmmm. I am less and less happy about the "contains" in the name biseq_reverse_contains. Neither is the first argument contained in the second nor the second contained in the first. And even if it were: We are not interested in mere containment, but we want to know whether a terminal segment of one argument coincides with an initial segment of the other argument.

comment:354 follow-ups: Changed 3 years ago by SimonKing

What do you think about the name biseq_start_of_overlap, and switching back to the old rôle of the arguments (i.e., we search for the smallest possible start index of a terminal segment of S1 that coincides with an initial segment of S2)?

comment:355 Changed 3 years ago by SimonKing

  • Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820

comment:356 Changed 3 years ago by SimonKing

  • Commit changed from 8db9f6542a54ead67c1df3d069103b620e85eea6 to 63d2693eb4915632e3cb806730a4eb56dc26008e

For the record: With git trac push 15820, only the branch of this ticket was changed, but not the sha1. Hence, I have to change it manually.

comment:357 Changed 3 years ago by SimonKing

By the way: Note that I've formulated error messages such as ValueError: BoundedIntegerSequence.index(x): x(=32) not in sequence by taking tuples as good example:

sage: (1,2,3).index(4)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-e2928ab3b336> in <module>()
----> 1 (Integer(1),Integer(2),Integer(3)).index(Integer(4))

ValueError: tuple.index(x): x not in tuple

Your error message is more similar to what is done for lists:

sage: [1,2,3].index(4)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-e163a1d1b274> in <module>()
----> 1 [Integer(1),Integer(2),Integer(3)].index(Integer(4))

ValueError: 4 is not in list

But that's not more than a side-note. I accept your change.

comment:358 in reply to: ↑ 351 Changed 3 years ago by jdemeyer

Replying to SimonKing:

  • You changed the location of the (old) function bitset_fix. Why?

Because I think the new location is better than the old one, it seemed out of place in the list of functions to generate limb patterns.

Won't that increase the likelihood of a merge conflict?

Well, every change increases the likelihood of a merge conflict... But I don't know of any other tickets changing bitset code, so it won't be that bad.

comment:359 in reply to: ↑ 354 Changed 3 years ago by jdemeyer

Replying to SimonKing:

What do you think about the name biseq_start_of_overlap, and switching back to the old rôle of the arguments (i.e., we search for the smallest possible start index of a terminal segment of S1 that coincides with an initial segment of S2)?

Well, the thing that bothered me most about the old biseq_max_overlap function was the implicit start index of 1, which is unexpected (why 1?) and couldn't be changed (unlike biseq_contains).

I changed the names and ordering of the arguments to make it more clear that biseq_contains and biseq_reverse_contains are in fact very similar: the first finds i such that S1[i:] starts with S2, the second finds i such that S1 starts with S2[i:].

I'm certainly open for changes, I just would like the functions to be as intuitive as possible for people not knowing about your application of quiver algebras. For example, the word "overlap" doesn't mean anything to me when you talk just about sequences of integers.

comment:360 Changed 3 years ago by jdemeyer

  • Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
  • Modified changed from 12/05/14 09:28:15 to 12/05/14 09:28:15

comment:361 Changed 3 years ago by SimonKing

Apparently the trac pages are not available, because of an internal error. So, I'm trying to comment using the sage-dev script.

You wrote:

I'm certainly open for changes, I just would like the functions to be as intuitive as possible for people not knowing about your application of quiver algebras. For example, the word "overlap" doesn't mean anything to me when you talk just about sequences of integers.

Here I disagree. If I have two lists A and B, then the statement "A overlaps with B" makes immediate sense to me. The only open question is whether "A overlaps with B" means that the overlap of the two lists is supposed to be at the end of A or at the end of B.

comment:362 Changed 3 years ago by jdemeyer

  • Commit changed from 63d2693eb4915632e3cb806730a4eb56dc26008e to 0a03d71d9df26c42cf91dab3df51ca549c434718

New commits:

0a03d71More descriptive function name for bounded integer sequences

comment:363 Changed 3 years ago by jdemeyer

To reduce unneeded merges, I have cherry-picked your commit on top of my branch.

comment:364 in reply to: ↑ 354 Changed 3 years ago by jdemeyer

Replying to SimonKing:

What do you think about switching back to the old rôle of the arguments

I think it's a bad idea.

Look at the following (now removed) code from you:

if other.startswith(self):
    return self
cdef mp_size_t i = biseq_max_overlap(self.data, other.data)

Note that the case i == 0 of biseq_max_overlap(B, A) corresponds to biseq_startswith(A, B). An unexpected reordering in my opinion.

In my proposal, the three functions biseq_startswith, biseq_contains and biseq_reverse_contains (whatever the name) all check whether some part of S1 starts with some part of S2. The only thing that changes is which parts of S1 and S2 are considered.

comment:365 Changed 3 years ago by jdemeyer

Better proposal: rename biseq_reverse_contains as biseq_startswith_tail. I think that's clear: check whether a sequence starts with the tail part of a sequence.

comment:366 Changed 3 years ago by git

  • Commit changed from 0a03d71d9df26c42cf91dab3df51ca549c434718 to 1560ce8dff16ed1fceabcc5656312f718d874211

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

1560ce8Rename biseq_reverse_contains as biseq_startswith_tail

comment:367 Changed 3 years ago by SimonKing

OK, I am fine with that change.

comment:368 Changed 3 years ago by jdemeyer

Thanks for the support!

For me, this ticket is positive_review except for the interrupt stuff.

comment:369 Changed 3 years ago by SimonKing

Since yesterday evening, I tried for about 20 times to push my changes (implementing interrupt stuff). But it doesn't work. Any idea why?

As usual, the error message is not very helpful:

> git trac push 15820
Pushing to Trac #15820...
Guessed remote branch: u/SimonKing/ticket/15820
Enter passphrase for key '/home/king/.ssh/id_rsa': 
Traceback (most recent call last):                                                                               
  File "/home/king/bin/git-trac", line 18, in <module>                                                           
    cmdline.launch()                                                                                             
  File "/home/king/Sage/git/git-trac-command/git_trac/cmdline.py", line 210, in launch                           
    app.push(ticket_number, remote=args.remote, force=args.force)
  File "/home/king/Sage/git/git-trac-command/git_trac/app.py", line 194, in push
    self.repo.push(remote, force)
  File "/home/king/Sage/git/git-trac-command/git_trac/git_repository.py", line 181, in push
    self.git.echo.push('trac', refspec)
  File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 341, in meth
    return self.execute(git_cmd, *args, **kwds)
  File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 98, in execute
    popen_stderr=subprocess.PIPE)
  File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 263, in _run
    raise GitError(result)
git_trac.git_error.GitError

Is that a problem on my side, or on trac?

Anyway, the diff would be

  • src/sage/data_structures/bounded_integer_sequences.pyx

    diff --git a/src/sage/data_structures/bounded_integer_sequences.pyx b/src/sage/data_structures/bounded_integer_sequences.pyx
    index 7f5e632..ccda54c 100644
    a b cdef bint biseq_init(biseq_t R, mp_size_t l, mp_bitcnt_t itemsize) except -1: 
    124124        totalbitsize = l * itemsize
    125125    else:
    126126        totalbitsize = 1
     127    sig_on()
    127128    bitset_init(R.data, totalbitsize)
     129    sig_off()
    128130    R.length = l
    129131    R.itembitsize = itemsize
    130132    R.mask_item = limb_lower_bits_up(itemsize)
    cdef bint biseq_init_copy(biseq_t R, biseq_t S) except -1: 
    140142    Initialize ``R`` as a copy of ``S``.
    141143    """
    142144    biseq_init(R, S.length, S.itembitsize)
     145    sig_on()
    143146    bitset_copy(R.data, S.data)
     147    sig_off()
    144148
    145149#
    146150# Conversion
    cdef bint biseq_init_list(biseq_t R, list data, size_t bound) except -1: 
    162166
    163167    biseq_init(R, len(data), BIT_COUNT(bound|<size_t>1))
    164168
     169    sig_check()
    165170    for item in data:
    166171        item_c = item
    167172        if item_c > bound:
    cdef bint biseq_init_concat(biseq_t R, biseq_t S1, biseq_t S2) except -1: 
    187192    The result is written into ``R``, which must not be initialised
    188193    """
    189194    biseq_init(R, S1.length + S2.length, S1.itembitsize)
     195    sig_on()
    190196    bitset_lshift(R.data, S2.data, S1.length * S1.itembitsize)
    191197    bitset_or(R.data, R.data, S1.data)
     198    sig_off()
    192199
    193200
    194201cdef inline bint biseq_startswith(biseq_t S1, biseq_t S2) except -1:
    cdef inline bint biseq_startswith(biseq_t S1, biseq_t S2) except -1: 
    207214        return False
    208215    if S2.length == 0:
    209216        return True
     217    sig_check()
    210218    return mpn_equal_bits(S1.data.bits, S2.data.bits, S2.data.size)
    211219
    212220
    cdef bint biseq_init_slice(biseq_t R, biseq_t S, mp_size_t start, mp_size_t stop 
    304312
    305313    if step == 1:
    306314        # Slicing essentially boils down to a shift operation.
     315        sig_on()
    307316        bitset_rshift(R.data, S.data, start*S.itembitsize)
     317        sig_off()
    308318        return 0
    309319
    310320    # In the general case, we move item by item.
    cdef mp_size_t biseq_contains(biseq_t S1, biseq_t S2, mp_size_t start) except -2 
    340350    if S2.length == 0:
    341351        return start
    342352    cdef mp_size_t index
     353    sig_check()
    343354    for index from start <= index <= S1.length-S2.length:
    344355        if mpn_equal_bits_shifted(S2.data.bits, S1.data.bits,
    345356                S2.length*S2.itembitsize, index*S2.itembitsize):
    cdef mp_size_t biseq_startswith_tail(biseq_t S1, biseq_t S2, mp_size_t start) ex 
    373384    if S1.length < S2.length - start:
    374385        start = S2.length - S1.length
    375386    cdef mp_size_t index
     387    sig_check()
    376388    for index from start <= index < S2.length:
    377389        if mpn_equal_bits_shifted(S1.data.bits, S2.data.bits,
    378390                (S2.length - index)*S2.itembitsize, index*S2.itembitsize):
    cpdef BoundedIntegerSequence NewBISEQ(tuple bitset_data, mp_bitcnt_t itembitsize 
    13351347    cdef BoundedIntegerSequence out = BoundedIntegerSequence.__new__(BoundedIntegerSequence)
    13361348    # bitset_unpickle assumes that out.data.data is initialised.
    13371349    biseq_init(out.data, length, itembitsize)
     1350    sig_on()
    13381351    bitset_unpickle(out.data.data, bitset_data)
     1352    sig_off()
    13391353    return out
    13401354
    13411355def _biseq_stresstest():

comment:370 Changed 3 years ago by jdemeyer

I can't help with git...

comment:371 Changed 3 years ago by git

  • Commit changed from 1560ce8dff16ed1fceabcc5656312f718d874211 to 4e9e1c51f6524bfa2d2c55d197b82a983dd01184

Branch pushed to git repo; I updated commit sha1. New commits:

4e9e1c5Add interrupt handling to bounded integer sequences

comment:372 Changed 3 years ago by jdemeyer

  • Status changed from needs_work to needs_review
  • Work issues interrupt handling deleted

comment:373 follow-up: Changed 3 years ago by SimonKing

Questions:

  • Is the stresstest really running forever?? It is supposed to take few seconds. I will test in a few minutes.
  • When should one use sig_check() and when sig_on()/sig_off()? I thought it is sig_check() when it could happen that an error or a return may happen that is not caused by a keyboard interrupt. But perhaps I am mistaken.

comment:374 follow-up: Changed 3 years ago by SimonKing

Aha, I see! You really changed the stress test so that it would run forever, but test that it can be interrupted. Good idea!

comment:375 in reply to: ↑ 374 ; follow-up: Changed 3 years ago by SimonKing

Replying to SimonKing:

Aha, I see! You really changed the stress test so that it would run forever, but test that it can be interrupted. Good idea!

Unfortunately it does NOT test that bounded integer sequence operations are interruptible. I tested a couple of times, and the exception always occurs in the randint function.

comment:376 Changed 3 years ago by SimonKing

Should we address the interrupt test in a different way? If not, I think it is a positive review from my side. But I'd appreciate if you could answer my question on sig_on/sig_off vs. sig_check.

comment:377 in reply to: ↑ 375 Changed 3 years ago by jdemeyer

Replying to SimonKing:

Unfortunately it does NOT test that bounded integer sequence operations are interruptible. I tested a couple of times, and the exception always occurs in the randint function.

I also tested a bit more and the exception indeed happens most often in randint() but also sometimes in other places.

comment:378 in reply to: ↑ 373 Changed 3 years ago by jdemeyer

Replying to SimonKing:

  • When should one use sig_check() and when sig_on()/sig_off()?

Does http://www.sagemath.org/doc/developer/coding_in_cython.html#interrupt-and-signal-handling answer your question?

comment:379 follow-up: Changed 3 years ago by SimonKing

Should we add tests in the spirit of this:

sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(4,[1,2,3,2,3,2,3])
sage: while 1:
....:     C = B+B
....:     
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-3-dda173c83bbe> in <module>()
----> 1 while Integer(1):
      2     C = B+B
      3 

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sage_python_check_interrupt (build/cythonized/sage/ext/c_lib.c:1683)()

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sig_raise_exception (build/cythonized/sage/ext/c_lib.c:769)()

KeyboardInterrupt: 
sage: while 1:
    C = B+B
....:     
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-4-dda173c83bbe> in <module>()
      1 while Integer(1):
----> 2     C = B+B
      3 

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.BoundedIntegerSequence.__add__ (build/cythonized/sage/data_structures/bounded_integer_sequences.c:10964)()

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.biseq_init_concat (build/cythonized/sage/data_structures/bounded_integer_sequences.c:7632)()

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sig_raise_exception (build/cythonized/sage/ext/c_lib.c:769)()

KeyboardInterrupt: 

As you can see, the interruption occurs in the correct place already in the second try.

comment:380 in reply to: ↑ 379 ; follow-up: Changed 3 years ago by jdemeyer

Replying to SimonKing:

Should we add tests in the spirit of this:

We could, but I'm not sure it is worth the trouble.

comment:381 in reply to: ↑ 380 Changed 3 years ago by SimonKing

  • Status changed from needs_review to positive_review

Replying to jdemeyer:

Replying to SimonKing:

Should we add tests in the spirit of this:

We could, but I'm not sure it is worth the trouble.

Then I guess I can set this to positive review, right?

comment:382 Changed 3 years ago by vbraun

  • Branch changed from u/jdemeyer/ticket/15820 to 4e9e1c51f6524bfa2d2c55d197b82a983dd01184
  • Resolution set to fixed
  • Status changed from positive_review to closed

comment:383 follow-up: Changed 3 years ago by tmonteil

  • Commit 4e9e1c51f6524bfa2d2c55d197b82a983dd01184 deleted

I just saw this ticket on the trac timeline, i wonder whether two similar objects are being developed in parallel by disjoint sets of people, see #17013.

comment:384 in reply to: ↑ 383 Changed 3 years ago by SimonKing

Replying to tmonteil:

I just saw this ticket on the trac timeline, i wonder whether two similar objects are being developed in parallel by disjoint sets of people, see #17013.

Indeed there is an obvious overlap. If I understand correctly, #17013 uses char * as underlying data structure, which means they are restricted to an alphabet of length at most 255. I have experimented with a similar implementation, too, but found that an implementation based on gmp gives better speed than what I was able to obtain with char*.

Note: See TracTickets for help on using tickets.