Opened 7 years ago

Closed 6 years ago

# Implement sequences of bounded integers

Reported by: Owned by: SimonKing major sage-6.4 algebra sequence bounded integer JStarx, jhpalmieri, was, stumpc5, saliola, SimonKing, gmoose05, nthiery, virmaux, ncohen, hivert Simon King, Jeroen Demeyer Jeroen Demeyer, Simon King N/A 4e9e1c5 #17195, #17196

#12630 has introduced tools to compute with quiver algebras and quiver representations. It is all Python code, paths being internally stored by lists of labelled edges of digraphs.

The aim of this ticket is to provide an efficient implementation of lists of bounded integers, with concatenation. This can be used to implement paths of a quiver more efficiently, but can also have different applications (say, elements of free groups, or other applications in sage-combinat).

### comment:1 Changed 7 years ago by SimonKing

There are various possible more or less obvious data formats:

1. Store a path as `char*` or `unsigned int*`, hence, as a list of integers, each integer corresponding to one edge of the quiver. `char*` would bound the total number of edges of the quiver to 256 (which might be too small), whereas `unsigned int*` might consume to much memory.
2. Store a path as `char*`, each integer corresponding to an outgoing edge of a vertex. Hence, the total number of edges may be arbitrary, we only have a local bound of 128 on the number of outgoing edges---this hopefully is enough (it certainly is enough for me).
3. As in 1., but the representation is compressed: If we have less than 128 edges, then one byte can represent a pair of edges, and a single 32- or 64-bit int could store even longer sequences of edges.
4. As in 2., but the representation is compressed: If each vertex has less than 8 outgoing edges, then a byte can represent a pair and a single 32-bit word can represent a 10-tuple of edges (if I did not miscalculate).

In 2. and 4. (and probably in 1. and 3. as well) we should explicitly store the start and end point of the path. The length should probably be stored as well.

In my applications, the following operations on paths are needed to be fast: Concatenation, hash/comparison, testing whether a path p starts with a path q, and less important whether a path p contains a path q.

Concatenation is particularly easy in 1. and 2. (just concatenation of lists, which is achieved by memcopy). It is a bit more difficult in 3. and 4., where the bytes of the second factor need to be shifted to get the compressed data seamless.

Hash and comparison are easy in all four formats, but of course best speed would be obtained in 4., simply since the amount of data that needs to be processed is smaller than in 1.-3.

In all four proposed data formats, it is easy to test whether path p starts with path q. Again, having the data stored in as little memory as possible will speed things up.

It is easy in 1. to test whether path p contains path q (as a subpath that is not necessarily an initial segment), whereas this becomes a bit more difficult in 2.-4.

What format would you suggest? Perhaps it makes sense to have some initial implementations of all four, and then compare directly.

### comment:2 follow-up: ↓ 3 Changed 7 years ago by nthiery

I'd say:

• Some implementation without limitations (the one from the previous ticket I guess)
• Whichever of 1-4 that covers your needs (256 edges for Gröbner bases calculations this should be already quite big, right?)
• As much as possible of the code shared between all implementations to make it easy to add new ones if deemed useful.

But that's just my 2 cents ... I don't have specific experience!

Cheers,

### comment:3 in reply to: ↑ 2 Changed 7 years ago by SimonKing

Hi Nicolas,

• Some implementation without limitations (the one from the previous ticket I guess)

An implementation which represents edges by `unsigned long long` can be considered "unlimited".

• Whichever of 1-4 that covers your needs (256 edges for Gröbner bases calculations this should be already quite big, right?)

Yes, but I want a general code.

• As much as possible of the code shared between all implementations to make it easy to add new ones if deemed useful.

Actually, on the level of paths, the implementations come in pairs: The compressed representations differ from the non-compressed---at the end of the day, we just have different ways of representing sequences of integers. Only when it comes to interpreting a sequence of integers as a path in a quiver will we have further differences: Does "[1,2]" mean "start with arrow 1 of the quiver and continue with arrow 2 of the quiver", or does it mean "travel along the outgoing arrow 1 of the vertex we are currently located at, and then continue with outgoing arrow 2 of the vertex we just arrived at".

Currently, I prefer a compressed representation by `unsigned long long*`. Rationale:

• This will work even for extremely large quivers---the only restriction is, that the number of edges can be represented as an `unsigned long long` (a larger quiver won't fit into computer's memory anyway).
• For small quivers, even paths of length 10 will easily fit into a single `unsigned long long`. That's quite efficient, both memory- and speed-wise.

A bit later I will post experimental code (not yet talking about quiver paths but about sequences of integers) giving some evidence on efficiency. And then we can still see how we interpret the integer sequences: Having a global or a local list of arrows?

### comment:4 Changed 7 years ago by SimonKing

I have attached a toy (or proof-of-concept) implementation; see path_test.pyx. In a real implementation in compressed representation, it would be the job of a `PathSemigroup` to find out how many edges fit into one word. And of course, in the toy implementation I am not trying to translate an integer sequence into the string representation of an element of the path semigroup.

What implementation would you suggest to use in "reality"? Is there another implementation of "sequences of bounded integers" that you would recommend? And would you enumerate the arrows of a quiver locally (i.e., only number the outgoing edges at each vertex) or globally? The former will result in better compression, but would make it more difficult to test for paths p,q whether there are paths r,s with `p==r*q*s`.

Now for the timings, and sorry for the long post...

Here, I am timing concatenation, hash, comparison, comparing initial segments. In all examples, we use the following paths of various sizes.

```p5 = Path(ZZ, 5, range(5), 1, 1)
p25 = Path(ZZ, 25, range(25), 1, 1)
p50 = Path(ZZ, 50, range(50), 1, 1)
p100 = Path(ZZ, 100, range(100), 1, 1)
p1000 = Path(ZZ, 1000, range(1000), 1, 1)
q5 = Path(ZZ, 5, range(1,6), 1, 1)
q25 = Path(ZZ, 25, range(1,26), 1, 1)
q50 = Path(ZZ, 50, range(1,51), 1, 1)
q100 = Path(ZZ, 100, range(1,101), 1, 1)
q1000 = Path(ZZ, 1000, range(1,1001), 1, 1)
r5 = Path(ZZ, 5, range(1,5)+[0], 1, 1)
r25 = Path(ZZ, 25, range(1,25)+[0], 1, 1)
r50 = Path(ZZ, 50, range(1,50)+[0], 1, 1)
r100 = Path(ZZ, 100, range(1,100)+[0], 1, 1)
r1000 = Path(ZZ, 1000, range(1,1000)+[0], 1, 1)
s5 = Path(ZZ, 5, range(5), 1, 1)
s25 = Path(ZZ, 25, range(25), 1, 1)
s50 = Path(ZZ, 50, range(50), 1, 1)
s100 = Path(ZZ, 100, range(100), 1, 1)
s1000 = Path(ZZ, 1000, range(1000), 1, 1)
```

We are doing the following tests.

Concatenation:

```%timeit p5*q5
%timeit p5*q25
%timeit p25*q25
%timeit p25*q50
%timeit p50*q50
%timeit p50*q100
%timeit p100*q100
%timeit p100*q1000
%timeit p1000*q1000
```

Hash:

```%timeit hash(p5)
%timeit hash(p25)
%timeit hash(p50)
%timeit hash(p100)
%timeit hash(p1000)
```

Comparison:

1. equal
```%timeit p5==s5
%timeit p25==s25
%timeit p50==s50
%timeit p100==s100
%timeit p1000==s1000
```
2. obviously different (first item differs)
```%timeit p5==r5
%timeit p25==r25
%timeit p50==r50
%timeit p100==r100
%timeit p1000==r1000
```
3. less obviously different (only the last item differs)
```%timeit q5==r5
%timeit q25==r25
%timeit q50==r50
%timeit q100==r100
%timeit q1000==r1000
```

Startswith:

```%timeit q1000.startswith(q100)  # it does start with
```

Uncompressed integer lists

Here, we define `Path=Path_v1` (see path_test.pyx).

Using `ctypedef unsigned long block`

```sage: %timeit p5*q5
1000000 loops, best of 3: 739 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 769 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 801 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 810 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 885 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 959 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 1.02 us per loop
sage: %timeit p100*q1000
100000 loops, best of 3: 1.99 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 3.06 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 135 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 166 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 214 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 333 ns per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 2.09 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 1.63 us per loop
sage: %timeit p25==s25
100000 loops, best of 3: 5.91 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 11 us per loop
sage: %timeit p100==s100
10000 loops, best of 3: 23 us per loop
sage: %timeit p1000==s1000
1000 loops, best of 3: 212 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 634 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 659 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 659 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 686 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 698 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.65 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 5.83 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 11.5 us per loop
sage: %timeit q100==r100
10000 loops, best of 3: 22 us per loop
sage: %timeit q1000==r1000
1000 loops, best of 3: 213 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 325 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 311 ns per loop
```

Using `ctypedef unsigned short block`

```sage: %timeit p5*q5
1000000 loops, best of 3: 700 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 738 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 820 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 769 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 808 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 877 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 909 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.49 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 2.13 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 138 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 179 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 236 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 383 ns per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 2.61 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 1.08 us per loop
sage: %timeit p25==s25
100000 loops, best of 3: 3.42 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 6.53 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 14.9 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 137 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 613 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 591 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 602 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 603 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 635 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.15 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 3.63 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 6.45 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 12.8 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 140 us per loop
sage: q1000.startswith(q100)
True
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 336 ns per loop
sage: q1000.startswith(r100)
False
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 325 ns per loop
```

Using `ctypedef unsigned char block`

```sage: %timeit p5*q5
1000000 loops, best of 3: 679 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 693 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 725 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 760 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 772 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 761 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 790 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.1 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.31 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 133 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 176 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 234 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 374 ns per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 2.53 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 1.08 us per loop
sage: %timeit p25==s25
100000 loops, best of 3: 3.36 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 6.3 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 12.5 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 121 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 581 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 589 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 599 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 599 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 636 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.06 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 3.46 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 6.51 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 12.7 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 121 us per loop
sage: q1000.startswith(q100)
True
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 324 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 315 ns per loop
```

Compressed integer lists

Here, we define `Path=Path_v2` (see path_test.pyx), and do

```sage: SizeManager.set_edge_number(27)
```

so that there is a total of only 27 edges (resp. 27 is the maximal number of outgoing arrows on a vertex). In particular, all integers in the sequences below are taken "mod 27". Since 27 has length 5 bits, a compressed representation using `unsigned char*` won't make sense. Hence, I am only testing long and short.

Using `ctypedef unsigned long block`

```sage: %timeit p5*q5
1000000 loops, best of 3: 676 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 671 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 683 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 725 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 759 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 787 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 811 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.44 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.61 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 132 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 134 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 142 ns per loop
sage: %timeit hash(p100)
10000000 loops, best of 3: 155 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 466 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 710 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.71 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.6 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 4.4 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 38 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 692 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 715 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 724 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 729 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 752 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 706 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 1.69 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 2.59 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 4.4 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 37.6 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 272 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 257 ns per loop
```

Using `ctypedef unsigned short block`

```sage: %timeit p5*q5
1000000 loops, best of 3: 718 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 682 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 710 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 740 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 722 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 781 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 858 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.98 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 2.17 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 132 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 148 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 168 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 206 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 951 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 723 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.77 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.93 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 5.32 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 47.5 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 577 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 586 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 599 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 588 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 607 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 754 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 1.87 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 2.89 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 5.28 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 47.3 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 272 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 256 ns per loop
```

### comment:5 Changed 7 years ago by SimonKing

It turns out that there are bugs in the toy code. But that's no surprise...

### comment:6 Changed 7 years ago by SimonKing

I have fixed the problems in path_test.pyx.

It was:

1. bitshift by a wrong offset, and
2. the bits that aren't used (e.g., the last 4 bits of an uint64_t when storing 12 arrows of size 5 bits) should be kept blank.

The latter involves an additional bitwise "and". So, probably the timings will slightly deteriorate.

### comment:7 Changed 7 years ago by SimonKing

There is one other variation that I want to test.

Currently, if the number of edges fits into 5 bits, then a 64 bit word is filled with data of 12 edges, plus 4 bits of garbage. The garbage must always be zero bits, for otherwise the shift operations would pollute real data with the garbage bits. To keep the garbage zero involves additional operations.

Alternatively, one could try to put fewer arrows into one word, so that there are no garbage bits. This could be done by encoding each arrow by a number of bits that is a power of 2. Hence, in the setting above, one would encode each edge by 8 rather than 5 bits, fitting 8 arrows into one 64 bit word. It is less dense, however the code gets simpler and should have less overhead.

### comment:8 Changed 7 years ago by SimonKing

I have replaced path_test.pyx again, adding two more implementations.

Do you have further suggestions on storage of lists of bounded integers? If you haven't, then I'll try to assess the timings stated in the various comments, according to what I expect to be useful in my applications.

Here are details on the two new implementations:

In `Path_v3`, I use Sage's `Integer` class as bit list. Sounds crazy, but it works rather well. I assume that bitshifts and comparison (and that's what we are using here) are internally written in assembler, and for us, the additional benefit is that we do not need to take care about the size of blocks (8 bit? 32 bit? 64 bit? 128 bit?) in which we store stuff---`Integer` does it for us. Perhaps the timings could be improved further by using the underlying c-library (GMP's `mpz_t`) directly.

In `Path_v4`, I provide a slightly simplified version of `Path_v2`: The data is compressed, but so that the chunks used to store one arrow fit seamlessly into a memory block. Hence, when we use 64-bit blocks, and have 27 arrows, then `Path_v2` would store the arrows in chunks of 5 bit (hence, it can store 12 arrows in one memory block, leaving 4 bit empty), whereas `Path_v4` would store the arrows in chunks of 8 bit (hence, it can only store 8 arrows in one memory block, but since this fits neatly into one memory block, the code can be simplified).

The timings for `Path_v3`:

```sage: %timeit p5*q5
1000000 loops, best of 3: 645 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 631 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 635 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 1.23 us per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 1.82 us per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 1.83 us per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 1.37 us per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.82 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.78 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 164 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 197 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 210 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 274 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 1.94 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 667 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 638 ns per loop
sage: %timeit p50==s50
1000000 loops, best of 3: 693 ns per loop
sage: %timeit p100==s100
1000000 loops, best of 3: 658 ns per loop
sage: %timeit p1000==s1000
1000000 loops, best of 3: 817 ns per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 746 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 723 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 769 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 731 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 733 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 710 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 733 ns per loop
sage: %timeit q50==r50
1000000 loops, best of 3: 718 ns per loop
sage: %timeit q100==r100
1000000 loops, best of 3: 719 ns per loop
sage: %timeit q1000==r1000
1000000 loops, best of 3: 755 ns per loop
sage: %timeit q1000.startswith(q100)
100000 loops, best of 3: 2.46 us per loop
sage: %timeit q1000.startswith(r100)
100000 loops, best of 3: 2.45 us per loop
```

The timings for `Path_v4`:

```sage: %timeit p5*q5
1000000 loops, best of 3: 730 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 762 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 806 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 849 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 854 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 931 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 930 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.97 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.45 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 136 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 148 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 159 ns per loop
sage: %timeit hash(p100)
10000000 loops, best of 3: 182 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 600 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 700 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.65 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.48 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 4.21 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 36.4 us per loop
sage: %timeit p5==r5
100000 loops, best of 3: 706 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 764 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 740 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 742 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 822 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 693 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 1.6 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 2.54 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 4.27 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 37.1 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 255 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 243 ns per loop
```

### comment:9 Changed 7 years ago by SimonKing

Hmmm. Perhaps I should have a look at `sage/misc/bitset.pxi`.

### comment:10 Changed 7 years ago by SimonKing

I couldn't directly use `sage/misc/bitset`, but I could learn from it: I have improved the `Path_v4` implementation (here, the arrows will be encoded by chunks of memory whose size is a power of 2). This was possible by using `memcmp` in some place (not in the method `startswith()`, where a loop over the data is faster than memcmp) and, in particular, use bitshift operations to replace multiplications and integer divisions (this is only possible with `Path_v4`, since here the factors are powers of 2 and can thus correspond to shifts).

New timings with `Path=Path_v4` and `SizeManager.set_edge_number(27)`:

```sage: %timeit p5*q5
1000000 loops, best of 3: 684 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 726 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 798 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 862 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 827 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 924 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 915 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.91 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.42 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 148 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 158 ns per loop
sage: %timeit hash(p50)
10000000 loops, best of 3: 171 ns per loop
sage: %timeit hash(p100)
10000000 loops, best of 3: 194 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 594 ns per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 498 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 547 ns per loop
sage: %timeit p50==s50
1000000 loops, best of 3: 631 ns per loop
sage: %timeit p100==s100
1000000 loops, best of 3: 712 ns per loop
sage: %timeit p1000==s1000
100000 loops, best of 3: 2.42 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 499 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 514 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 549 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 526 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 545 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 506 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 544 ns per loop
sage: %timeit q50==r50
1000000 loops, best of 3: 589 ns per loop
sage: %timeit q100==r100
1000000 loops, best of 3: 704 ns per loop
sage: %timeit q1000==r1000
100000 loops, best of 3: 2.48 us per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 263 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 232 ns per loop
```

### comment:11 Changed 7 years ago by SimonKing

PS: I chose 27 edges, since this should be particularly bad for `Path_v4`. Indeed, in `Path_v2`, 12 arrows can be put into one `uint64_t`, whereas in `Path_v4` only 8 arrows fit into one `uint64_t`. So, if `Path_v4` turns out to be faster than `Path_v2` with 27 edges, then it is likely to be generally faster, I think.

### comment:12 Changed 7 years ago by SimonKing

From my side, this is the last part of the proof-of-concept implementations, before assessing the benchmark tests and choosing an implementation for the elements of path semigroups...

In `Path_v5`, I am using the GMP library directly, thus avoiding some overhead compared with `Path_v3`. The timings I get (with `Path=Path_v5` and `SizeManager.set_edge_number(27)`):

```sage: %timeit p5*q5
1000000 loops, best of 3: 720 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 814 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 823 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 808 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 828 ns per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 890 ns per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 866 ns per loop
sage: %timeit p100*q1000
1000000 loops, best of 3: 1.18 us per loop
sage: %timeit p1000*q1000
1000000 loops, best of 3: 1.38 us per loop
sage: %timeit hash(p5)
10000000 loops, best of 3: 159 ns per loop
sage: %timeit hash(p25)
10000000 loops, best of 3: 171 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 198 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 266 ns per loop
sage: %timeit hash(p1000)
1000000 loops, best of 3: 1.93 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 466 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 509 ns per loop
sage: %timeit p50==s50
1000000 loops, best of 3: 495 ns per loop
sage: %timeit p100==s100
1000000 loops, best of 3: 485 ns per loop
sage: %timeit p1000==s1000
1000000 loops, best of 3: 693 ns per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 451 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 468 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 463 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 468 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 482 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 448 ns per loop
sage: %timeit q25==r25
1000000 loops, best of 3: 462 ns per loop
sage: %timeit q50==r50
1000000 loops, best of 3: 448 ns per loop
sage: %timeit q100==r100
1000000 loops, best of 3: 457 ns per loop
sage: %timeit q1000==r1000
1000000 loops, best of 3: 470 ns per loop
sage: %timeit q1000.startswith(q100)
1000000 loops, best of 3: 673 ns per loop
sage: %timeit q1000.startswith(r100)
1000000 loops, best of 3: 657 ns per loop
```

### comment:13 follow-up: ↓ 14 Changed 7 years ago by ncohen

Hmmmmmmm... Well, looks like the last one is the best, even though it is a bit slow for hashing. If the sizeof a block is a power of two it should be quite good, don't you think ?

This being said, perhaps it should be implemented like the bitset stuff, in a algebraic-free file, to store sequences of integers as you said.

How do you define your hashing function and your comparison function, by the way ?

Nathann

### comment:14 in reply to: ↑ 13 Changed 7 years ago by SimonKing

Hmmmmmmm... Well, looks like the last one is the best, even though it is a bit slow for hashing.

What do you mean by "the last one"? Using GMP's `mpz_t`directly, which is `Path_v5` in the attachment? Or the improved "semicompressed" version of `uint64_t*` (storing chunks of size `2^n` rather then in the smallest possible chunk size), which is `Path_v4`?

If the sizeof a block is a power of two it should be quite good, don't you think ?

Should be. My own summary of the timings above would be this. We have 6 benchmarks, and in the following list I am giving for each benchmark a small, a medium and a large example. In the end, `+` or `-` indicate whether the respective implementation did well/poorly in the three examples.

1. Concatenation
```                     5x5   50x50  1000x1000
Uncompressed long : 739 ns  885 ns 3.06 us  - -
Uncompressed short: 700 ns  808 ns 2.13 us
Uncompressed char : 679 ns  772 ns 1.31 us    +
Compressed   long : 676 ns  759 ns 1.61 us
Compressed   short: 718 ns  722 ns 1.98 us   +
Semicompr.   long : 684 ns  827 ns 1.42 us
Integer           : 645 ns 1.82 us 1.78 us  +-
GMP          mpz_t: 723 ns  799 ns 1.41 us
```
1. Hash
```                     5      100     1000
Uncompressed long : 135 ns 333 ns  2.09 us  +
Uncompressed short: 138 ns 383 ns  2.61 us  +--
Uncompressed char : 133 ns 374 ns  2.53 us  +
Compressed   long : 132 ns 155 ns   466 ns  +++
Compressed   short: 132 ns 206 ns   951 ns  +
Semicompr.   long : 148 ns 194 ns   594 ns
Integer           : 164 ns 274 ns  1.94 us  -
GMP          mpz_t: 159 ns 266 ns  1.93 us
```
1. == for equal paths
```                     5       100     1000
Uncompressed long : 1.63 us 23   us  212   us ---
Uncompressed short: 1.08 us 14.9 us  137   us
Uncompressed char : 1.08 us 12.5 us  121   us
Compressed   long :  710 ns  4.4 us   38   us
Compressed   short:  723 ns  5.3 us   47.5 us
Semicompr.   long :  498 ns  712 ns    2.4 us
Integer           :  667 ns  658 ns    817 ns
GMP          mpz_t:  466 ns  485 ns    693 ns +++
```
1. == for "easily" unequal paths
```                     5       100     1000
Uncompressed long : 634 ns   686 ns   698 ns
Uncompressed short: 613 ns   603 ns   635 ns
Uncompressed char : 581 ns   599 ns   636 ns
Compressed   long : 692 ns   729 ns   752 ns  --
Compressed   short: 577 ns   588 ns   607 ns
Semicompr.   long : 499 ns   526 ns   545 ns
Integer           : 746 ns   731 ns   733 ns --
GMP          mpz_t: 451 ns   468 ns   482 ns +++
```
1. == for "difficult" unequal paths
```                     5       100      1000
Uncompressed long : 1.65 us   22 us    213 us   ---
Uncompressed short: 1.15 us   12.8 us  140 us
Uncompressed char : 1.06 us   12.7 us  121 us
Compressed   long :  706 ns    4.4 us   37.6 us
Compressed   short:  754 ns    5.3 us   47.3 us
Semicompr.   long :  506 ns    704 ns   2.5 us
Integer           :  710 ns    719 ns   755 ns
GMP          mpz_t:  448 ns    457 ns   470 ns  +++
```
1. startswith (here we only have one example size, one where the answer is True, and one where the answer is False)
```                     yes      no
Uncompressed long :  325 ns  311 ns
Uncompressed short:  336 ns  325 ns
Uncompressed char :  324 ns  315 ns
Compressed   long :  272 ns  257 ns  +
Compressed   short:  272 ns  256 ns  +
Semicompr.   long :  263 ns  232 ns  ++
Integer           : 2.46 us 2.45 us  --
GMP          mpz_t:  673 ns  657 ns
```

So, I don't think there is a clear winner in the competition.

This being said, perhaps it should be implemented like the bitset stuff, in a algebraic-free file, to store sequences of integers as you said.

Yes, it would make sense.

How do you define your hashing function and your comparison function, by the way ?

Depends on the implementation. Of course, when I use `Integer` or `mpz_t`, I am using the hash function provided by GMP. When I implement a `block*`, where `block` is either `char`, `uint32_t` or `uint64_t`, I am using some hash function for lists that I found in the internet a while ago (I don't recall where). It is as follows:

```    def __hash__(self):
cdef block h
cdef block* ptr
h = <block>(self.start^(self.end<<16)^(self._len))
ptr = <block *>self.data
cdef size_t i
h ^= FNV_offset
for i from 0<=i<self._len:
h ^= ptr[i]
h *= FNV_prime
if h==-1:
return -2
return h
```

where `FNV_prime` and `FNV_offset` depend on the architecture. I am surprised that this is faster than GMP's hash function.

### comment:15 Changed 7 years ago by SimonKing

• Description modified (diff)
• Summary changed from Improve efficiency of the path semigroup to Implement sequences of bounded integers

### comment:16 Changed 7 years ago by SimonKing

• Keywords sequence bounded integer added; quiver path algebra efficiency removed

As Nathann has mentioned, it would make sense to provide a general implementation of sequences of bounded integers (independent of quiver paths), and then quiver paths can inherit from it later. That's why I changed the ticket description.

### comment:17 Changed 7 years ago by SimonKing

• Dependencies #12630 deleted

### Changed 7 years ago by SimonKing

A proof of concept

### comment:18 Changed 7 years ago by SimonKing

For completeness, I have added `Path_v6`, which uses Python's `tuple` under the hood (which is an obvious way to implement paths, too).

Result:

```sage: %timeit p5*q5
1000000 loops, best of 3: 439 ns per loop
sage: %timeit p5*q25
1000000 loops, best of 3: 542 ns per loop
sage: %timeit p25*q25
1000000 loops, best of 3: 612 ns per loop
sage: %timeit p25*q50
1000000 loops, best of 3: 869 ns per loop
sage: %timeit p50*q50
1000000 loops, best of 3: 1.1 us per loop
sage: %timeit p50*q100
1000000 loops, best of 3: 1.39 us per loop
sage: %timeit p100*q100
1000000 loops, best of 3: 1.41 us per loop
sage: %timeit p100*q1000
100000 loops, best of 3: 5.04 us per loop
sage: %timeit p1000*q1000
100000 loops, best of 3: 11 us per loop
sage: %timeit hash(p5)
1000000 loops, best of 3: 245 ns per loop
sage: %timeit hash(p25)
1000000 loops, best of 3: 532 ns per loop
sage: %timeit hash(p50)
1000000 loops, best of 3: 884 ns per loop
sage: %timeit hash(p100)
1000000 loops, best of 3: 1.63 us per loop
sage: %timeit hash(p1000)
100000 loops, best of 3: 14.6 us per loop
sage: %timeit p5==s5
1000000 loops, best of 3: 885 ns per loop
sage: %timeit p25==s25
1000000 loops, best of 3: 1.71 us per loop
sage: %timeit p50==s50
100000 loops, best of 3: 2.63 us per loop
sage: %timeit p100==s100
100000 loops, best of 3: 4.53 us per loop
sage: %timeit p1000==s1000
10000 loops, best of 3: 39.3 us per loop
sage: %timeit p5==r5
1000000 loops, best of 3: 785 ns per loop
sage: %timeit p25==r25
1000000 loops, best of 3: 829 ns per loop
sage: %timeit p50==r50
1000000 loops, best of 3: 799 ns per loop
sage: %timeit p100==r100
1000000 loops, best of 3: 791 ns per loop
sage: %timeit p1000==r1000
1000000 loops, best of 3: 822 ns per loop
sage: %timeit q5==r5
1000000 loops, best of 3: 1.48 us per loop
sage: %timeit q25==r25
100000 loops, best of 3: 3.72 us per loop
sage: %timeit q50==r50
100000 loops, best of 3: 6.63 us per loop
sage: %timeit q100==r100
100000 loops, best of 3: 12.5 us per loop
sage: %timeit q1000==r1000
10000 loops, best of 3: 116 us per loop
sage: %timeit q1000.startswith(q100)
100000 loops, best of 3: 4.7 us per loop
sage: %timeit q1000.startswith(r100)
100000 loops, best of 3: 4.72 us per loop
```

The comparison with the other implementations becomes this:

1. Concatenation
```                     5x5   50x50  1000x1000
Uncompressed long : 739 ns  885 ns 3.06 us  - -
Uncompressed short: 700 ns  808 ns 2.13 us
Uncompressed char : 679 ns  772 ns 1.31 us    +
Compressed   long : 676 ns  759 ns 1.61 us
Compressed   short: 718 ns  722 ns 1.98 us   +
Semicompr.   long : 684 ns  827 ns 1.42 us
Integer           : 645 ns 1.82 us 1.78 us   -
GMP          mpz_t: 723 ns  799 ns 1.41 us
tuple             : 439 ns  1.1 us 11   us  + -
```
1. Hash
```                     5       100     1000
Uncompressed long : 135 ns  333 ns  2.09 us  +
Uncompressed short: 138 ns  383 ns  2.61 us  +--
Uncompressed char : 133 ns  374 ns  2.53 us  +
Compressed   long : 132 ns  155 ns   466 ns  +++
Compressed   short: 132 ns  206 ns   951 ns  +
Semicompr.   long : 148 ns  194 ns   594 ns
Integer           : 164 ns  274 ns  1.94 us
GMP          mpz_t: 159 ns  266 ns  1.93 us
tuple             : 245 ns 1.63 us 14.6  us  ---
```
1. == for equal paths
```                     5       100     1000
Uncompressed long : 1.63 us 23   us  212   us ---
Uncompressed short: 1.08 us 14.9 us  137   us
Uncompressed char : 1.08 us 12.5 us  121   us
Compressed   long :  710 ns  4.4 us   38   us
Compressed   short:  723 ns  5.3 us   47.5 us
Semicompr.   long :  498 ns  712 ns    2.4 us
Integer           :  667 ns  658 ns    817 ns
GMP          mpz_t:  466 ns  485 ns    693 ns +++
tuple             :  885 ns  4.5 us   39.3 us
```
1. == for "easily" unequal paths
```                     5       100     1000
Uncompressed long : 634 ns   686 ns   698 ns
Uncompressed short: 613 ns   603 ns   635 ns
Uncompressed char : 581 ns   599 ns   636 ns
Compressed   long : 692 ns   729 ns   752 ns
Compressed   short: 577 ns   588 ns   607 ns
Semicompr.   long : 499 ns   526 ns   545 ns
Integer           : 746 ns   731 ns   733 ns
GMP          mpz_t: 451 ns   468 ns   482 ns +++
tuple             : 785 ns   791 ns   822 ns ---
```
1. == for "difficult" unequal paths
```                     5       100      1000
Uncompressed long : 1.65 us   22   us  213 us   ---
Uncompressed short: 1.15 us   12.8 us  140 us
Uncompressed char : 1.06 us   12.7 us  121 us
Compressed   long :  706 ns    4.4 us   37.6 us
Compressed   short:  754 ns    5.3 us   47.3 us
Semicompr.   long :  506 ns    704 ns   2.5 us
Integer           :  710 ns    719 ns   755 ns
GMP          mpz_t:  448 ns    457 ns   470 ns  +++
tuple             : 1.48 us   12.5 us  116 us
```
1. startswith
```                     yes      no
Uncompressed long :  325 ns  311 ns
Uncompressed short:  336 ns  325 ns
Uncompressed char :  324 ns  315 ns
Compressed   long :  272 ns  257 ns  +
Compressed   short:  272 ns  256 ns  +
Semicompr.   long :  263 ns  232 ns  ++
Integer           : 2.46 us 2.45 us
GMP          mpz_t:  673 ns  657 ns
tuple             : 4.7  us 4.72 us  --
```

So, using tuples is best for concatenation of short paths, but that's the only case where it is not too slow.

### comment:19 follow-up: ↓ 20 Changed 7 years ago by ncohen

Well, given the outcomes I guess it mostly depends on your own practical applications ?... `:-)`

I also wonder (because I hate them) how much time is spent dealing with parents. But all your implementations have to go through that at the moment so it does not change your comparisons.

Well, I am back to work at long long last. With a computer AND a screen. And in Brussels, which counts for the good mood `:-P`

Have fuuuuuuuuuuuuuun !

Nathann

### comment:20 in reply to: ↑ 19 Changed 7 years ago by SimonKing

Hi Nathann,

I also wonder (because I hate them) how much time is spent dealing with parents.

Well, at some point parents have to come into play, namely when I want to implement not just sequences of bounded integers, but elements of the path semigroup of a quiver.

The strategical question is: Should the data defining a path be stored in an attribute of that path (the attribute would be an instance of `BoundedIntegerSequence`), or should the path itself be an instance of `BoundedIntegerSequence`? In the former case, it would be perfectly fine to have sequences of bounded integers parent-less; but I worry about an overhead of accessing the attribute. In the latter case, `BoundedIntegerSequence` should already be a sub-class of `sage.structure.element.Element` (and have a parent), since Cython does not allow multiple inheritance.

I tend to the latter solution also for a different reason: Instances of `BoundedIntegerSequence` have to know the bound for their elements, because the data storage depends on that bound. For efficiency (see attachment), several other constants are derived from that bound, and used in the algorithms.

Should each instance of `BoundedIntegerSequence` know about all these bounds? Or should there be a common parent for all sequences of integers that are bounded by a number B?

I tend to the latter. If you assign several ints (the afore-mentioned constants) to an instance of `BoundedIntegerSequence`, it will likely be more time consuming (during initialisation) than to store just a single data (namely the parent).

That said, looking up the constants via the parent would be more time-consuming than looking up constants that are stored directly in the element.

Difficult....

### comment:21 follow-up: ↓ 22 Changed 7 years ago by ncohen

Hellooooo Simon !

Hmmmm... All this is important to settle before implementing those BoundedIntegerSequences? objects somewhere indeed.

What I thought you intended was to write a very low-level data structure somewhere in the misc/ folder. The kind of stuff that one does not use by mistake, the kind of stuff that ones does not use without reading the manual first. So I thought it would not exactly check its input, trusting blindly the developer and being as efficient as possible.

Then, I thought you would have a higher-level algebraic stuff with parents colors and blinking lights that would check their input indeed, and be what we want a user-friendly object to be.

But I actually have only one question: I don't know how you intend to use all this in the end, but I got the impression that several functions may have to generate a LOT of paths in order to compute some small data, and throw all the paths away once the computations have ended. Do you want do create high-level objects in those functions or not ?

If you don't, then perhaps you would be better with only low-level objects that you know how to use, which would not have to waste time (if time is actually wasted) on high-level problems.

And so I thought that you high-level objects would have a variable which would be this low-level object, etc, etc...

What do you have in mind ?

Nathann

### comment:22 in reply to: ↑ 21 ; follow-up: ↓ 23 Changed 7 years ago by SimonKing

Hi Nathann,

What I thought you intended was to write a very low-level data structure somewhere in the misc/ folder.

I somehow do.

The kind of stuff that one does not use by mistake,

Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).

Then, I thought you would have a higher-level algebraic stuff with parents colors and blinking lights that would check their input indeed, and be what we want a user-friendly object to be.

Again ideally: Blinking colours should only be added when one really implements paths in quivers.

But I actually have only one question: I don't know how you intend to use all this in the end, but I got the impression that several functions may have to generate a LOT of paths in order to compute some small data, and throw all the paths away once the computations have ended. Do you want do create high-level objects in those functions or not ?

Good point. So, you mean that these functions should actually not create fully-fledged quiver paths, but should just operate on instances of `BoundedIntegerSequence`, and only in the end interpret them as paths, when the battle smoke has vanished.

And so I thought that you high-level objects would have a variable which would be this low-level object, etc, etc...

Yes, probably that's better.

So, perhaps I should really do `cdef class BoundedIntegerSequence(tuple)` rather than `cdef class BoundedIntegerSequence(MonoidElement)`.

But then, what to do with all these internally used constants that are needed even for the low-level functionality of `BoundedIntegerSequence`? Would you prefer to compute and store these constants 10000 times when creating 10000 sequences of integers bounded by `B`? Or would you prefer to compute and store these constants only once, namely in a parent that hosts all sequences of integers with a fixed bound?

Or do you have a third solution for storing/accessing these constants efficiently?

### comment:23 in reply to: ↑ 22 ; follow-up: ↓ 24 Changed 7 years ago by ncohen

Helloooooo !!

Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).

Why do you want it to inherit from tuple ? I do not know the effect of such a thing, what it brings and what should be feared

Good point. So, you mean that these functions should actually not create fully-fledged quiver paths, but should just operate on instances of `BoundedIntegerSequence`, and only in the end interpret them as paths, when the battle smoke has vanished.

Well, yep. Only create the high-level stuff with possible overhead when you need it. And you end up knowing exactly what your code does when you are writing the code of your time-critical function.

So, perhaps I should really do `cdef class BoundedIntegerSequence(tuple)` rather than `cdef class BoundedIntegerSequence(MonoidElement)`.

Hmmm.. I still don't get why you want it to inherit from tuple. My natural move would be to implement this as a C structure, not even a object `:-P`

But then, what to do with all these internally used constants that are needed even for the low-level functionality of `BoundedIntegerSequence`? Would you prefer to compute and store these constants 10000 times when creating 10000 sequences of integers bounded by `B`? Or would you prefer to compute and store these constants only once, namely in a parent that hosts all sequences of integers with a fixed bound?

Well, somehow that's already what we do with graph backends. We have a Generic Backend, extended twice in Dense Graphs and Sparse Graphs. And all the constants needed by the data structures are stored in the corresponding files. You could have a `BoundedIntegerSequence` class implementing no function at all, extended by alld ifferent implementations of the data structures you want. Even in the same file.

... And we don't need parents for that `:-PPPP`

Nathann

### comment:24 in reply to: ↑ 23 ; follow-up: ↓ 25 Changed 7 years ago by SimonKing

Why? It should be a replacement for tuples (thus, low-level), but it should ideally be (close to) a drop-in replacement (thus, it doesn't matter whether one uses it by mistake).

Why do you want it to inherit from tuple ?

By analogy to `Sequence_generic` (which inherits from `list`):

```sage: S = Sequence([1,2,3])
sage: type(S)
<class 'sage.structure.sequence.Sequence_generic'>
sage: type(S).mro()
[sage.structure.sequence.Sequence_generic,
sage.structure.sage_object.SageObject,
list,
object]
```

Well, somehow that's already what we do with graph backends. We have a Generic Backend, extended twice in Dense Graphs and Sparse Graphs. And all the constants needed by the data structures are stored in the corresponding files.

No, that's a different situation. If you have a sequence `S1` of integers bounded by `B1` and a sequence `S2` of integers bounded by `B2` then the constants for `S1` are different from the constants of `S2`. So, it is not really a "global" constant that is shared by all instances of a certain data structure (and could thus be put into a file), but when you do, for example, `for x in B1` then you'll need different constants than when you do `for x in B2`.

So, these constants are not shared by all sequences, but by all sequence that share the same bound. Thus, it seems natural (to me, at least) to have one object `O(B)` that is shared by all sequences of bound `B`, so that `O(B)` provides the constants that belong to `B`.

Sure, `O(B)` could be written in C as well. But why not making it a parent, when all what we do is letting each sequence have a pointer to `O(B)`?

### comment:25 in reply to: ↑ 24 ; follow-up: ↓ 26 Changed 7 years ago by ncohen

By analogy to `Sequence_generic` (which inherits from `list`):

?...

Well. And why is that a good idea ? `:-P`

No, that's a different situation. If you have a sequence `S1` of integers bounded by `B1` and a sequence `S2` of integers bounded by `B2` then the constants for `S1` are different from the constants of `S2`. So, it is not really a "global" constant that is shared by all instances of a certain data structure (and could thus be put into a file), but when you do, for example, `for x in B1` then you'll need different constants than when you do `for x in B2`.

Hmmmm... I see. Well, isn't your only parameter the number of bits you need to store each entry ? If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?

If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something `:-)`

I'd say.... 4,8,16,32 ? `:-)`

So, these constants are not shared by all sequences, but by all sequence that share the same bound. Thus, it seems natural (to me, at least) to have one object `O(B)` that is shared by all sequences of bound `B`, so that `O(B)` provides the constants that belong to `B`.

If your data structure is not an object, you could also add this width parameter as an input of the functions you use ? I guess all Bounded Sequence Integer that you use in the same function would have the same bound.

But admittedly I love to split hairs. Tell me if you think that this is going too far `:-P`

Sure, `O(B)` could be written in C as well. But why not making it a parent, when all what we do is letting each sequence have a pointer to `O(B)`?

Ahahahah. That's up to you. I just hate Sage's parents/elements. I would live happy with a C pointer I can control, but not witha framework that I do not understand. If you understand it, then you know what you are doing. I don't `:-P`

Nathann

### comment:26 in reply to: ↑ 25 ; follow-ups: ↓ 27 ↓ 30 Changed 7 years ago by SimonKing

Hmmmm... I see. Well, isn't your only parameter the number of bits you need to store each entry ?

No. We have

```cdef Integer NumberArrows
cdef size_t ArrowBitSize, BigArrowBitSize
cdef size_t ArrowsPerBlock, BigArrowsPerBlock
cdef size_t ModBAPB, DivBAPB, TimesBABS, TimesBlockSize
```

For example, `ModBAPB` is used to replace `n%BigArrowsPerBlock` by a bit-wise `and`, which is noticeably faster.

If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?

Depending on the implementation...

If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something `:-)`

Isn't duplication of code supposed to smell?

If your data structure is not an object, you could also add this width parameter as an input of the functions you use ? I guess all Bounded Sequence Integer that you use in the same function would have the same bound.

Do I understand correctly: You suggest that we should not provide a class `BoundedIntegerSequence` that can be used interactively in Sage, but you suggest that we should just provide a couple of functions that operate on C structures (say, `mpz_t` or `usigned long*`) and can only be used in Cython code?

Then it would be better to revert this ticket to its original purpose: There would be...

1. ... cdef functions operating on `mpz_t` resp. on `unsigned long*`,
2. ... a Cython class `Path` using `mpz_t` resp. `unsigned long*` as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,
3. ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with `Path` but with `mpz_t*` resp `unsigned long**` (which will be the case anyway).

But admittedly I love to split hairs. Tell me if you think that this is going too far `:-P`

No problem. I just want to know if people see the need to have a Cython implementation of sequences of bounded integers, for use in Python.

Ahahahah. That's up to you. I just hate Sage's parents/elements. I would live happy with a C pointer I can control, but not witha framework that I do not understand.

Making it something like a container rather than a fully fledged parent would be an option, from my perspective.

### comment:27 in reply to: ↑ 26 Changed 7 years ago by SimonKing

Making it something like a container rather than a fully fledged parent would be an option, from my perspective.

... I mean, for `BoundedIntegerSequence`. Of course, `Path` must have a parent (the path semigroup of a quiver).

### comment:28 Changed 7 years ago by nthiery

Just a random thought. Some issues here are of the same nature as what was discussed around ClonableElement? and its subclasses. Maybe there is stuff to share with it, especially if one ends up inheriting from Element.

### comment:29 Changed 7 years ago by SimonKing

I just had a look at `ClonableElement`, and one immediate thought is: Why are C-pointers allocated in `__init__`? That's clearly the purpose of `__cinit__`, in particular if said C-pointers are freed in `__dealloc__`.

Do we want to have sequences of bounded integers for use in Python? Then one possible way would be:

• have some inline cdef functions that operate on certain (templated?) C data structures
• use these in a sub-class of `ClonableElement` (so that we can have sequences of bounded integers used in python)
• use these in a `Path` class, to be used in path semigroups.

### comment:30 in reply to: ↑ 26 ; follow-up: ↓ 31 Changed 7 years ago by ncohen

Yooooooo !!!

No. We have

```cdef Integer NumberArrows
cdef size_t ArrowBitSize, BigArrowBitSize
cdef size_t ArrowsPerBlock, BigArrowsPerBlock
cdef size_t ModBAPB, DivBAPB, TimesBABS, TimesBlockSize
```

Hmmmm.. Can't they all be deduced from the type of implementation (compressed long, for instance) and the number of bits per entry then ?

For example, `ModBAPB` is used to replace `n%BigArrowsPerBlock` by a bit-wise `and`, which is noticeably faster.

Yepyep of course.

If it is, did you decided whether you want to allow it to be arbitrary, or if you want it to be a power of 2 ?

Depending on the implementation...

Okay....

If it is a power of 2, then perhaps templates are sufficient, i.e. "one data structure per value of this parameter". There are not so many powers of 2 between 1 and 64 that make sense to store something `:-)`

Isn't duplication of code supposed to smell?

Well, if it is a template the code is only implemented once. And it will generate several data structures at compile time indeed, but each of them will have a hardcoded constant, and you only implement one.

Do I understand correctly: You suggest that we should not provide a class `BoundedIntegerSequence` that can be used interactively in Sage, but you suggest that we should just provide a couple of functions that operate on C structures (say, `mpz_t` or `usigned long*`) and can only be used in Cython code?

That's what I had in mind, but you make decisions here, pick what you like.

It's a bit how bitsets are implemented. A C data structure, and an object wrapper on top of it. Or C Graphs, with a data structure on top of it. It is how Permutations should be implemented, too `:-P`

Well, you have a C object that you use when you want performance, and when you don't care you use the higher-level implementations.

Then it would be better to revert this ticket to its original purpose: There would be...

1. ... cdef functions operating on `mpz_t` resp. on `unsigned long*`,
2. ... a Cython class `Path` using `mpz_t` resp. `unsigned long*` as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,
3. ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with `Path` but with `mpz_t*` resp `unsigned long**` (which will be the case anyway).

Well, I like this plan. What do you think ? Does it displease you in any way ?

No problem. I just want to know if people see the need to have a Cython implementation of sequences of bounded integers, for use in Python.

I do not see how I could use it in my code at the moment, but I like the idea of having such a data structure somewhere, ready to be used. Most importantly, if you want speed, I think that having it available at two different levels is a safe bet. You write more code, but you will know how computations are spent in your time-critical functions.

Nathann

### comment:31 in reply to: ↑ 30 ; follow-up: ↓ 33 Changed 7 years ago by SimonKing

Hi Nathann,

Hmmmm.. Can't they all be deduced from the type of implementation (compressed long, for instance) and the number of bits per entry then ?

Sure. And you want to do this same computation `10^7` times?

Isn't duplication of code supposed to smell?

Well, if it is a template the code is only implemented once. And it will generate several data structures at compile time indeed, but each of them will have a hardcoded constant, and you only implement one.

Yes. Probably my problem is that I don't think "C++", but I think "Cython". And there it is a bit clumsy (I think templating is mimicked in Cython by including .pxi files, isn't it?).

That's what I had in mind, but you make decisions here, pick what you like.

It's a bit how bitsets are implemented.

Right, that's what I have looked at, and think makes sense here. Up to the constants.

Then it would be better to revert this ticket to its original purpose: There would be...

1. ... cdef functions operating on `mpz_t` resp. on `unsigned long*`,
2. ... a Cython class `Path` using `mpz_t` resp. `unsigned long*` as a cdef attribute, using the afore mentioned functions to implement concatenation and iteration,
3. ... a subsequent ticket, implementing an F5 style algorithm to compute standard bases, operating not with `Path` but with `mpz_t*` resp `unsigned long**` (which will be the case anyway).

Well, I like this plan. What do you think ? Does it displease you in any way ?

No, it is one possibility.

### comment:33 in reply to: ↑ 31 ; follow-up: ↓ 34 Changed 7 years ago by ncohen

Sure. And you want to do this same computation `10^7` times?

I don't get it. If the width is a power of two, then there are 6 possible values, i.e. 21, 22, 23, 24, 25, 26. Otherwise you have at most 64, but really who cares in the end ?

Yes. Probably my problem is that I don't think "C++", but I think "Cython". And there it is a bit clumsy (I think templating is mimicked in Cython by including .pxi files, isn't it?).

I am thinking of this : http://stackoverflow.com/questions/4840819/c-template-specialization-with-constant-value

That's C++, and I don't see how to do it in Cython, though. The point is that you can parameter a data structure with a constant value, and the data structure will be compiled for each value of the constant *that appears in the code at compile-time* (not all possible values, or course). The point is that the code is optimized while knowing the value(which can lead to optimization) and also that you can do things that would not be possible otherwise, like "cdef int a[k]" where k is a (templated) variable.

Nathann

### comment:34 in reply to: ↑ 33 ; follow-up: ↓ 35 Changed 7 years ago by SimonKing

Sure. And you want to do this same computation `10^7` times?

I don't get it. If the width is a power of two, then there are 6 possible values, i.e. 21, 22, 23, 24, 25, 26. Otherwise you have at most 64, but really who cares in the end ?

Apparently I am not used to templated code...

I was somehow thinking that you suggested to keep the bitlength of the upper bound for the integers as the only parameter, which means that you would need to compute all the other things either for each operation or (when storing it as attribute of the sequences) for each sequence.

I am thinking of this : http://stackoverflow.com/questions/4840819/c-template-specialization-with-constant-value

That's C++, and I don't see how to do it in Cython, though.

Part of the question is: Do we want to use gmp as backend? Or do we want to (re-)implement operations on densely packed `long*` by ourselves? Given the timings, gmp is an option.

Anyway, I'll have a look at the stackoverflow discussion. Thank you for the pointer!

The next question then is how to access the code from Cython. I've done this with C, but not C++.

### comment:35 in reply to: ↑ 34 ; follow-up: ↓ 37 Changed 7 years ago by ncohen

Apparently I am not used to templated code...

I was somehow thinking that you suggested to keep the bitlength of the upper bound for the integers as the only parameter, which means that you would need to compute all the other things either for each operation or (when storing it as attribute of the sequences) for each sequence.

Nonono. Somehow, you can think of templates as a way to "script" the generation of code. You write some code with a constant k which is never defined anywhere, and then you say "compile one version of this code for each k in 1, 2, 3, 4, 5, 6. You end up with several data structures implemented, each with a hardcoded value. And then you can create an object `cdef LongCompressedPath<5> my_path` in which the hardcoded bound is `2^5`. You just have many additional types, each with its own constant.

Part of the question is: Do we want to use gmp as backend? Or do we want to (re-)implement operations on densely packed `long*` by ourselves? Given the timings, gmp is an option.

Yep, you are right. Depends on how you weight the operations that you will need: it looks okay for concatenations and bad for startwith, but I expect that the most time consuming operation will be concatenation ? You are the one who needs it `:-)`

Anyway, I'll have a look at the stackoverflow discussion. Thank you for the pointer!

The next question then is how to access the code from Cython. I've done this with C, but not C++.

HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?

Nathann

### comment:36 Changed 7 years ago by vbraun_spam

• Milestone changed from sage-6.2 to sage-6.3

### comment:37 in reply to: ↑ 35 ; follow-up: ↓ 42 Changed 7 years ago by SimonKing

Hi Nathann,

sorry for the long silence!

HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?

Did you find out how templated code can be written in Cython? If you have asked on cython-devel (so far I have only posted on cython-users) or have another pointer, then please tell me. If not, tell me also...

Cheers, Simon

### comment:38 Changed 7 years ago by SimonKing

Hi Nathann,

wouldn't the following be close to templates?

In Cython:

```cimport cython

ctypedef fused str_or_int:
str
int

cpdef str_or_int times_two(str_or_int var):
return var+var

def show_me():
cdef str a = 'a'
cdef int b = 3
print 'str', times_two(a)
print 'int', times_two(b)
```

and this results in

```sage: show_me()
str aa
int 6
```

### comment:39 Changed 7 years ago by SimonKing

Or perhaps a better example, showing that the fused type is used to choose code at compile time: Put the following into a cell of the notebook

```%cython
cimport cython

ctypedef fused str_or_int:
str
int
cpdef str_or_int times_two(str_or_int var):
if str_or_int is int:
return var*2
else:
return var+var
def test():
cdef int a = 3
cdef str b = 'b'
a = times_two(a)
b = times_two(b)
```

and look at the resulting code. The line `a = times_two(a)` becomes

``` __pyx_v_a = __pyx_fuse_1__pyx_f_68_home_king__sage_sage_notebook_sagenb_home_admin_2_code_sage4_spyx_0_times_two(__pyx_v_a, 0);
```

whereas `b = times_two(b)` becomes

```__pyx_t_1 = __pyx_fuse_0__pyx_f_68_home_king__sage_sage_notebook_sagenb_home_admin_2_code_sage4_spyx_0_times_two(__pyx_v_b, 0);
```

and one can check that Cython created two different C-functions `__pyx_fuse_0...` and `__pyx_fuse_1...` for the two possible (type-dependent!) versions of `times_two`.

### comment:40 follow-up: ↓ 41 Changed 7 years ago by SimonKing

And here an example showing that it pays off, speed-wise:

In a .pyx file:

```cimport cython

ctypedef fused str_or_int:
str
int

cpdef str_or_int times_two(str_or_int var):
if str_or_int is int:
return var+var
else:
return var+var

cpdef untyped_times_two(var):
return var+var

cpdef int int_times_two(int var):
return var+var

cpdef str str_times_two(str var):
return var+var

def test1():
cdef int a = 3
cdef str b = 'b'
a = times_two(a)
b = times_two(b)

def test2():
cdef int a = 3
cdef str b = 'b'
a = untyped_times_two(a)
b = untyped_times_two(b)

def test3():
cdef int a = 3
cdef str b = 'b'
a = int_times_two(a)
b = str_times_two(b)
```

Attach the .pyx file to Sage, and get this:

```age: %attach /home/king/Sage/work/templates/test.pyx
Compiling /home/king/Sage/work/templates/test.pyx...
sage: %timeit test1()
10000000 loops, best of 3: 159 ns per loop
sage: %timeit test2()
1000000 loops, best of 3: 199 ns per loop
sage: %timeit test3()
10000000 loops, best of 3: 168 ns per loop
```

EDIT: I have replaced the original example by something fairer, where it says `var+var` everywhere, but in the templated version Cython can choose the c-implementation of `var+var` at compile time, whereas it can't (and hence loses time when running the code) in the untemplated version.

EDIT 2: The new example also shows that choosing a specialised implementation manually is not faster than Cython using templates.

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:41 in reply to: ↑ 40 ; follow-up: ↓ 43 Changed 7 years ago by ncohen

Hellooooooooooooooooo !!!

EDIT: I have replaced the original example by something fairer, where it says `var+var` everywhere, but in the templated version Cython can choose the c-implementation of `var+var` at compile time, whereas it can't (and hence loses time when running the code) in the untemplated version.

HMmmmmm... Does it make any difference ? gcc should resolve those "if" at compile-time, so if Cython creates two version of the function with an "if" inside depending on the type, gcc should not include it in the final binary as the constants involved can be defined at compile-time.

Anyway, that's a good news ! `:-D`

Nathann

### comment:42 in reply to: ↑ 37 Changed 7 years ago by ncohen

Yooooooooooo !!

HMmmmmmm.... There is a C++ interface in numerical/backends/coin_backend.pyx but it does not play with templates. I am interested in knowing if this kind of templates can be written in pure Cython, do you mind if I ask the question on cython-devel (did you want to send one yourself) ?

Did you find out how templated code can be written in Cython? If you have asked on cython-devel (so far I have only posted on cython-users) or have another pointer, then please tell me. If not, tell me also...

Ahahah.. Well, no ... When I ask a question I do not do anything before I get an answer, soooooooooo I never did it `:-P`

I ask a question, then forget everything about it. When I get the answer I do what should be done `:-P`

Nathann

### comment:43 in reply to: ↑ 41 ; follow-up: ↓ 44 Changed 7 years ago by SimonKing

Hi Nathann,

HMmmmmm... Does it make any difference ? gcc should resolve those "if" at compile-time, so if Cython creates two version of the function with an "if" inside depending on the type, gcc should not include it in the final binary as the constants involved can be defined at compile-time.

Yes, it is indeed resolved at compile time. Two c-functions are created, and only the part of the code relevant for the respective code branch appears in the c-function.

In other words, I now plan to create cdef functions like this:

```ctypedef seq_t:
first_implementation
second_implementation

cdef seq_t concat(seq_t x, seq_t y):
if seq_t is first_implementation:
<do stuff for 1st implementation>
else:
<do stuff for 2nd implementation>
```

and then, in other modules (perhaps also in `src/sage/combinat/words/word_datatypes.pyx`?), one could uniformly call `concat(a,b)`, provided that the type of `a` and `b` are known at compile time (or write `concat[first_implementation](a,b)` explicitly).

I think that such Cython code will be sufficiently nice to read, and the resulting C-code will be sufficiently fast.

Anyway, that's a good news ! `:-D`

Indeed!

Simon

### comment:44 in reply to: ↑ 43 Changed 7 years ago by ncohen

I think that such Cython code will be sufficiently nice to read, and the resulting C-code will be sufficiently fast.

+1

Nathann

### comment:45 Changed 7 years ago by SimonKing

Hmmm. Meanwhile I am a bit less positive about fused types. Let `seq_t` be a fused type for different implementations.

• Fused types can not be used in iterators. I.e., the following is refused:
```def iter_seq(seq_t S):
for <bla>:
<do something>
yield x
```
• Fused types can not be used as attributes of cdef classes. I.e., the following is refused:
```cdef class BoundedIntSeq:
cdef seq_t seq
def __mul__(self, other):
<do type check>
cdef seq_t out = concat_seq(self.seq, other.seq)
<create and return new instance of BoundedIntSeq using "out">
```

Hence, it seems that one needs to create two separate iterators, and it also seems that one needs code duplication. I.e., when `impl1` and `impl2` are two specialisations of the fused type `seq_t`, then one has to have

```cdef class BoundedIntSeq_impl1:
cdef impl1 seq
def __mul__(self, other):
...
cdef impl1 out = concat_seq(self.seq, other.seq)
cdef class BoundedIntSeq_impl2:
cdef impl2 seq
def __mul__(self, other):
...
cdef impl2 out = concat_seq(self.seq, other.seq)
```

This mainly amounts to cut-and-paste, but is ugly!

I think I'll ask on cython-users if there are known ideas to solve this more elegantly.

### comment:46 Changed 7 years ago by SimonKing

Some variation on the theme: One can create fused types from two cdef classes, and then one can work with specialisations of functions as follows:

```ctypedef fused str_or_int:
str
int

# make the function do different things on different types,
# to better see the difference
cpdef str_or_int times_two(str_or_int var):
if str_or_int is int:
return var+var
else:
return var+'+'+var

# forward declaration of cdef classes and their fusion
cdef class Bar_int
cdef class Bar_str
cdef fused Bar:
Bar_int
Bar_str

# a function that can be specialised to either of the two extension classes
def twice(Bar self):
# and inside, we specialised a function depending
# on the type of an attribute --- at compile time!!
return type(self)(times_two(self.data))

cdef class Bar_str:
cdef str data
def __init__(self, data):
self.data = data
def __repr__(self):
return self.data
# First possibility: Use the "templated" function as a method
twice_ = twice[Bar_str]
# second possibility: Wrap the "templated" function
cpdef Bar_str twice__(self):
return twice(self)

# The same, with the other Specialisation of <Bar>
cdef class Bar_int:
cdef int data
def __init__(self, data):
self.data = data
def __repr__(self):
return "int(%d)"%self.data
twice_ = twice[Bar_int]
cpdef Bar_int twice__(self):
return twice(self)
```

Time-wise, we see that wrapping the "templated" function is slower than using it as an attribute:

```sage: b1 = Bar_str('abc')
sage: b1.twice_()
abc+abc
sage: b1.twice__()
abc+abc
sage: %timeit b1.twice_()
1000000 loops, best of 3: 759 ns per loop
sage: %timeit b1.twice__()
100000 loops, best of 3: 13.7 µs per loop
sage: b2 = Bar_int(5)
sage: b2.twice_()
int(10)
sage: b2.twice__()
int(10)
sage: %timeit b2.twice_()
1000000 loops, best of 3: 539 ns per loop
sage: %timeit b2.twice__()
100000 loops, best of 3: 9.2 µs per loop
```

• We can reduce the code duplication to something like this:
```<define templated function foo_ depending on a fused type Bar with specialisations Bar1, Bar2,...>
cdef class Bar1:
foo = foo_[Bar1]
cdef class Bar2:
foo = foo_[Bar2]
...
```
I hope this level of copy-and-paste can be afforded. At least, it gives good speed.
• In my tests I found the following:
• It is impossible to make `foo_` a cpdef function. Otherwise, the assignment `foo = foo_` fails with the statement that `foo_` can not be turned into a Python type.
• It is impossible to chose the same name for the templated function and for the method it is turned into. Doing
```class Bar1:
foo_ = foo_
```
results in an error saying that `Bar1` has not attribute `foo_`.
• With the syntax above, both `twice_` and `twice__` have been put into the class' `__dict__`. I did not succeed to turn it into the equivalent of a cdef method.

### comment:47 Changed 7 years ago by SimonKing

I did a lot of benchmark tests for the different implementations (using `long*` or `char*` or `mpz_t` to store sequences of bounded integers, using cdef functions that operate on fused types) and eventually found that using GMP (i.e., `mpz_t`) is fastest in all tests, by a wide margin.

Why did I not find this result before? Well, part of the reason is that GMP sometimes provides different functions to do the same job, and in the past few days I considerably improved my usage of the various possibilities. For example, `mpz_t` can be initialised in different ways. If one uses `mpz_init`, then the assignment that will follow results in a re-allocation. Since the length of the concatenation of two tuples is known in advance, prescribing the number of to-be-allocated bits with `mpz_init2` resulted in a speed-up of concatenation, actually it became twice as fast.

And this result is not surprising. After all, GMP has to implement the same bitshift and comparison operations for bit arrays that I was implementing in the "`long*`" approach, too. But GMP probably uses implementation techniques that I can not even dream of.

Now, my plan is to

• create some `ctypedef struct bounded_int_tuple` that is based on `mpz_t` and is the underlying C structure for tuples of bounded integers
• provide a collection of cdef (inline) functions, operating on `bounded_int_tuple`, for concatenation, copying, comparison and so on
• wrap it in a cdef class `BoundedIntegerTuple`. This will probably not derive from `Element`: I think it makes no difference from the point of view of memory efficiency whether each `BoundedIntegerTuple` stores a pointer to the parent of "all tuples of integers bounded by `B`" or directly stores `B` as an unsigned int.

In a subsequent ticket, I plan to use it for quiver paths, and if the combinat crowd cares, they can use it to make parts of sage.combinat.words a lot faster.

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:48 follow-up: ↓ 51 Changed 7 years ago by ncohen

None from me. I keep being impressed at how much testing/researching you put into this quiver features... I really wonder what you will do with it in the end `;-)`

Nathann

### comment:49 Changed 7 years ago by SimonKing

• Branch set to u/SimonKing/ticket/15820
• Created changed from 02/14/14 17:31:31 to 02/14/14 17:31:31
• Modified changed from 05/25/14 15:39:08 to 05/25/14 15:39:08

### comment:50 Changed 7 years ago by SimonKing

• Authors set to Simon King
• Commit set to 5dc78c5f5f647bf95b98da916d7086fae539ffb8
• Status changed from new to needs_review

I have pushed a branch with an initial implementation of bounded integer sequences. I took care of error handling: I think you will find that those GMP commands that may result in a memory error are wrapped in sig_on()/sig_off(), and the cdef boilerplate functions do propagate errors. In fact, in some of my tests, I was hitting Ctrl-C, and it worked.

The rest of this post is about benchmarks. I compare against Python tuples---which is of course ambitious, and you will find that not in all situations tuples are slower than bounded integer sequences. However, one should see it positively: The bounded integer sequences implemented here have features very similar to tuples, and there are operations for which they are a lot faster than tuples. It all depends on the length and the bound of the sequence. That's why I think that it is a reasonable contribution. And to make it faster (without Python classes), one can still use the cdef boilerplate functions. Generally, it seems that long bounded integer sequences are faster than long tuples, but short tuples are faster than short sequences. For hash, bounded integer sequences are pretty good, but they suck in accessing items, or in slicing with step different from 1.

In contrast to tuples, bounded integer sequences also provide fairly quick tests for whether a sequences starts with a given sequence, or whether a sequence contains a certain sub-sequence. This is a feature I took from strings.

Here are the tests, with different sizes and bounds. Variable names starting with "T" hold tuples, those starting with "S" hold bounded integer sequences.

```sage: from sage.structure.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
```

Iteration

```sage: timeit("x=[y for y in T0]", number=1000000)
1000000 loops, best of 3: 783 ns per loop
sage: timeit("x=[y for y in T1]", number=1000000)
1000000 loops, best of 3: 1.48 µs per loop
sage: timeit("x=[y for y in T2]", number=100000)
100000 loops, best of 3: 4.3 µs per loop
sage: timeit("x=[y for y in T3]", number=1000)
1000 loops, best of 3: 245 µs per loop
sage: timeit("x=[y for y in S0]", number=1000000)
1000000 loops, best of 3: 2.38 µs per loop
sage: timeit("x=[y for y in S1]", number=100000)
100000 loops, best of 3: 4.1 µs per loop
sage: timeit("x=[y for y in S2]", number=100000)
100000 loops, best of 3: 10.1 µs per loop
sage: timeit("x=[y for y in S3]", number=1000)
1000 loops, best of 3: 1.79 ms per loop
```

Slicing

Bounded integer sequences are immutable and hence copied by identity. But let us do slicing, dropping the last item:

```sage: timeit("x=T3[:-1]", number=100000)
100000 loops, best of 3: 19.5 µs per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 3.48 µs per loop
```

Slicing with `step!=1` is much slower, though:

```sage: timeit("x=T3[:-1:2]", number=100000)
100000 loops, best of 3: 11.7 µs per loop
sage: timeit("x=S3[:-1:2]", number=1000)
1000 loops, best of 3: 2.23 ms per loop
```

Perhaps I should mark it "TODO"...

Accessing single items

Short sequences

```sage: timeit("x=T0[1]", number=1000000)
1000000 loops, best of 3: 361 ns per loop
sage: timeit("x=T0[4]", number=1000000)
1000000 loops, best of 3: 373 ns per loop
sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 959 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 960 ns per loop
```

Large sequences (time also depends on the bound for the integer sequence!)

```sage: timeit("x=T3[1]", number=1000000)
1000000 loops, best of 3: 359 ns per loop
sage: timeit("x=T3[4500]", number=1000000)
1000000 loops, best of 3: 382 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 1.97 µs per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 1.49 µs per loop
```

Comparison

Note that comparison of bounded integer sequences works different from comparison of tuples or lists, as detailed in the documentation.

We compare sequences that are equal but non-identical, that differ in early items, or that differ in late items.

```sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: T0x = tuple(L0x); T1x = tuple(L1x); T2x = tuple(L2x); T3x = tuple(L3x)
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: T0y = tuple(L0); T1y = tuple(L1); T2y = tuple(L2); T3y = tuple(L3)
sage: S1y = BoundedIntegerSequence(8, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
```

Early differences:

```sage: timeit("T0==T0x", number=1000000)
1000000 loops, best of 3: 143 ns per loop
sage: timeit("T1==T1x", number=1000000)
1000000 loops, best of 3: 145 ns per loop
sage: timeit("T2==T2x", number=1000000)
1000000 loops, best of 3: 143 ns per loop
sage: timeit("T3==T3x", number=1000000)
1000000 loops, best of 3: 161 ns per loop
sage: timeit("S0==S0x", number=1000000)
1000000 loops, best of 3: 538 ns per loop
sage: timeit("S1==S1x", number=1000000)
1000000 loops, best of 3: 550 ns per loop
sage: timeit("S2==S2x", number=1000000)
1000000 loops, best of 3: 490 ns per loop
sage: timeit("S3==S3x", number=1000000)
1000000 loops, best of 3: 559 ns per loop
```

Equal sequences:

```sage: timeit("T0==T0y", number=1000000)
1000000 loops, best of 3: 169 ns per loop
sage: timeit("T1==T1y", number=1000000)
1000000 loops, best of 3: 255 ns per loop
sage: timeit("T2==T2y", number=1000000)
1000000 loops, best of 3: 597 ns per loop
sage: timeit("T3==T3y", number=100000)
100000 loops, best of 3: 47.8 µs per loop
sage: timeit("S0==S0y", number=1000000)
1000000 loops, best of 3: 511 ns per loop
sage: timeit("S1==S1y", number=1000000)
1000000 loops, best of 3: 493 ns per loop
sage: timeit("S2==S2y", number=1000000)
1000000 loops, best of 3: 583 ns per loop
sage: timeit("S3==S3y", number=1000000)
1000000 loops, best of 3: 1.41 µs per loop
```

Late differences:

```sage: T0z1 = T0+T0
sage: T0z2 = T0+T0x
sage: T1z1 = T1+T1
sage: T1z2 = T1+T1x
sage: T2z1 = T2+T2
sage: T2z2 = T2+T2x
sage: T3z1 = T3+T3
sage: T3z2 = T3+T3x
sage: timeit("T0z1==T0z2", number=100000)
100000 loops, best of 3: 206 ns per loop
sage: timeit("T1z1==T1z2", number=100000)
100000 loops, best of 3: 308 ns per loop
sage: timeit("T2z1==T2z2", number=100000)
100000 loops, best of 3: 640 ns per loop
sage: timeit("T3z1==T3z2", number=100000)
100000 loops, best of 3: 47.8 µs per loop
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z1==S0z2", number=100000)
100000 loops, best of 3: 585 ns per loop
sage: timeit("S1z1==S1z2", number=100000)
100000 loops, best of 3: 494 ns per loop
sage: timeit("S2z1==S2z2", number=100000)
100000 loops, best of 3: 555 ns per loop
sage: timeit("S3z1==S3z2", number=100000)
100000 loops, best of 3: 578 ns per loop
```

Hash

This is the only operation for which bounded integer sequences seem to be consistently faster than tuples:

```sage: timeit("hash(T0)", number=100000)
100000 loops, best of 3: 179 ns per loop
sage: timeit("hash(T0)", number=1000000)
1000000 loops, best of 3: 177 ns per loop
sage: timeit("hash(T1)", number=1000000)
1000000 loops, best of 3: 267 ns per loop
sage: timeit("hash(T2)", number=1000000)
1000000 loops, best of 3: 584 ns per loop
sage: timeit("hash(T3)", number=100000)
100000 loops, best of 3: 45.3 µs per loop
sage: timeit("hash(S0)", number=1000000)
1000000 loops, best of 3: 145 ns per loop
sage: timeit("hash(S1)", number=1000000)
1000000 loops, best of 3: 153 ns per loop
sage: timeit("hash(S2)", number=1000000)
1000000 loops, best of 3: 194 ns per loop
sage: timeit("hash(S3)", number=1000000)
1000000 loops, best of 3: 8.97 µs per loop
```

Concatenation

```sage: timeit("T0+T0", number=1000000)
1000000 loops, best of 3: 200 ns per loop
sage: timeit("T1+T1", number=1000000)
1000000 loops, best of 3: 311 ns per loop
sage: timeit("T2+T2", number=1000000)
1000000 loops, best of 3: 742 ns per loop
sage: timeit("T3+T3", number=10000)
10000 loops, best of 3: 40.3 µs per loop
sage: timeit("S0+S0", number=1000000)
1000000 loops, best of 3: 576 ns per loop
sage: timeit("S1+S1", number=1000000)
1000000 loops, best of 3: 590 ns per loop
sage: timeit("S2+S2", number=1000000)
1000000 loops, best of 3: 618 ns per loop
sage: timeit("S3+S3", number=1000000)
1000000 loops, best of 3: 2.17 µs per loop
```

Subsequences

Recall the definition of `S0z1` etc. We find:

```sage: S0z2.startswith(S0)
True
sage: S0z2.startswith(S0x)
False
sage: S0x in S0z2
True
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 239 ns per loop
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 241 ns per loop
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 694 ns per loop
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 227 ns per loop
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 223 ns per loop
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 1.08 µs per loop
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 247 ns per loop
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 230 ns per loop
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 3.21 µs per loop
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 989 ns per loop
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 218 ns per loop
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 3.57 ms per loop
```

I wonder if the latter could be improved. Another "TODO"...

New commits:

 ​5dc78c5 `Implement sequences of bounded integers`

### comment:51 in reply to: ↑ 48 Changed 7 years ago by SimonKing

None from me. I keep being impressed at how much testing/researching you put into this quiver features... I really wonder what you will do with it in the end `;-)`

• Gröbner bases for modules over path algebra quotients.
• In particular, an implementation of the non-commutative version of Faugère's F5 algorithm that I have described in my latest article.
• Compute minimal generating sets for modules over so-called "basic algebras" (that's a special type of path algebra quotients), which is possible with the non-commutative F5, as shown in my latest article.
• Use all this to compute minimal projective resolutions of basic algebras, but please be faster than what we currently do in our optional group cohomology spkg. Currently, we use a method of David Green for computing the minimal resolutions; but F5 should be more efficient---theoretically...

That said: I guess it would make sense to adopt bounded integer sequences in appropriate parts of sage.combinat.words, but I leave this to others.

### comment:52 follow-up: ↓ 53 Changed 7 years ago by ncohen

Hellooooooo Simon !

Sorry but I only began to read your code and now I should really go get some sleep. I paste my current notes here in the meantime, but I am a long way from a proper review job.

• bounded integer sequence, biseq. What about "integer sequence" ? All integers are "bounded" on a computer. Why add this "bounded" everywhere ? biseq -> iseq ?
• `# Bitsize of "data"` --> bitsize of the whole sequence
• itembitsize --> necessarily a power of 2 ?
• `#cdef inline void dealloc_biseq(biseq_t S)` : why there if commented ?
• `list2biseq` (and others) -> `list_to_biseq` ?
• Comments on the function definitions --> Move it to the function's doc, even if it is their only content ?
• Could you write some doc, even if it is just one line, for cdef functions ? A bit like what we have for bitsets ? Something like
```cdef biseq_t* allocate_biseq(size_t l, unsigned long int itemsize):
"""
Allocates a sequences of l integers stored on itemsize bits each.
"""
```
• list2biseq --> I wondered why this function wouldn't return a new biseq buil from a list instead of modifying an existing one.... Performance issue ? Is that critical in some cases ?

Nathann

### comment:53 in reply to: ↑ 52 Changed 7 years ago by SimonKing

• bounded integer sequence, biseq. What about "integer sequence" ?

Not good, because...

All integers are "bounded" on a computer. Why add this "bounded" everywhere ? biseq -> iseq

... the computer's integer bound is chosen when you buy the machine, but not when you create the sequence.

The point is that each "bounded integer sequence" has a bound `B` that can be chosen upon initialisation so that any item of the sequence is smaller than `B`.

• `# Bitsize of "data"` --> bitsize of the whole sequence

OK.

• itembitsize --> necessarily a power of 2 ?

No. Here, it is really "bit" size. So, the number tells how many bits will be reserved to store one item. Hence, the bound `B` above is a power of two, namely `2^itembitsize`.

• `#cdef inline void dealloc_biseq(biseq_t S)` : why there if commented ?

Because I forgot to remove it.

• `list2biseq` (and others) -> `list_to_biseq` ?

Acceptable.

• Comments on the function definitions --> Move it to the function's doc, even if it is their only content ?
• Could you write some doc, even if it is just one line, for cdef functions ?

I wouldn't remove the comments from the pxd file, but certainly some documentation should be added to the cdef functions in the pyx file. I wonder if this should be by comments in the code or by a doc string. After all, the doc string will not be visible (cdef functions won't appear in the docs, as they can not imported).

• I just figured out that you may have wanted to keep this list of functions above with short descriptions as an 'index' of what the file does. If so, it would be cool to do it in the html doc instead !

You mean: "This file provides the following boilerplate functions, that you can cimport for usage in Cython code: ..."?

• list2biseq --> I wondered why this function wouldn't return a new biseq buil from a list instead of modifying an existing one.... Performance issue ? Is that critical in some cases ?

This is motivated by my usage in the `BoundedIntegerSequence` wrapper. As usual, there is `__cinit__` which does memory allocations (as much as I know, it is recommended to not do this kind of things in `__init__`). Hence, when `__init__` is called (and I think filling data into the allocated memory is the job of `__init__`, not of `__cinit__`), then the integer sequence already is allocated, and just needs to be filled.

### comment:54 follow-up: ↓ 55 Changed 7 years ago by mmezzarobba

Sorry it that's a stupid question (I only glanced through the code), but why are you using `mpz`'s and not `mpn`'s? It looks like you are already doing most if not all memory management by hand...

### comment:55 in reply to: ↑ 54 ; follow-up: ↓ 56 Changed 7 years ago by SimonKing

Sorry it that's a stupid question (I only glanced through the code), but why are you using `mpz`'s and not `mpn`'s?

Simply because I did not succeed in finding what functions are available for `mpn_t`. So, if you can give me a pointer (similar to the documentation for `mpz_t`), I'd appreciate. Probably non-signed types will be a tad faster than signed types.

It looks like you are already doing most if not all memory management by hand...

What would be the alternative to doing so? I mean, what I need is some tool to quickly operate on large bit arrays by shift operations, preferably also with a good and fast hash function, and without the need to think about the number of bits fitting into a byte or a long. I tried to implement this myself, using `char*` or `long*` (which would also mean to take care of allocating and freeing), but `mpz_t` turns out to be faster.

So, if you know another tool, that is perhaps more directly dedicated to bit arrays, then please tell!

### comment:56 in reply to: ↑ 55 ; follow-up: ↓ 57 Changed 7 years ago by mmezzarobba

Simply because I did not succeed in finding what functions are available for `mpn_t`. So, if you can give me a pointer (similar to the documentation for `mpz_t`), I'd appreciate.

The documentation for `mpn_*` is at https://gmplib.org/manual/Low_002dlevel-Functions.html

It looks like you are already doing most if not all memory management by hand...

What would be the alternative to doing so?

I'm not saying I see one, only that, as far as I understand, automatic memory management is the main advantage of `mpz_t` over `mpn_*` when you are dealing with nonnegative integers (hence my first question).

Last edited 7 years ago by mmezzarobba (previous) (diff)

### comment:57 in reply to: ↑ 56 ; follow-up: ↓ 58 Changed 7 years ago by SimonKing

The documentation for `mpn_*` is at https://gmplib.org/manual/Low_002dlevel-Functions.html

Thank you, I'll have a look and will try to compare.

It looks like you are already doing most if not all memory management by hand...

What would be the alternative to doing so?

I'm not saying I see one, only that, as far as I understand, automatic memory management is the main advantage of `mpz_t` over `mpn_*` when you are dealing with nonnegative integers (hence my first question).

Now I understand what you mean by your statement that I did all memory management by hand: I am not using `mpz_init` but `mpz_init2`, telling exactly how many bits I will need. I found that it has a noticeable impact on performance, since otherwise re-allocations will happen internally.

### comment:58 in reply to: ↑ 57 Changed 7 years ago by mmezzarobba

Now I understand what you mean by your statement that I did all memory management by hand: I am not using `mpz_init` but `mpz_init2`

Yes, exactly. Sorry if I wasn't clear!

### comment:59 Changed 7 years ago by git

• Commit changed from 5dc78c5f5f647bf95b98da916d7086fae539ffb8 to efef80bea22840f618335b7dd6f5d3e9a64fa301

Branch pushed to git repo; I updated commit sha1. New commits:

 ​7ceac67 `Merge branch 'develop' into ticket/15820` ​efef80b `Relocate bounded integer sequences (sage.misc, not sage.structure)`

### comment:60 Changed 7 years ago by SimonKing

• Status changed from needs_review to needs_work
• Work issues set to Document cdef functions

I have at some point merged the latest beta. Sorry, I keep forgetting that merging stuff prematurely is bad. Main point was to relocate the code from sage.structure to sage.misc.

I don't think I will use `mpn_*`. It just seems too low-level. For example, I see `mpn_lshift` and `mpn_rshift` operations, but they operate on limbs, not on bit arrays. Hence, in order to implement a bitshift on bit arrays (which is what I need in concatenation and many other operations), I'd still need to invest a lot of work, essentially what I have done with the proof-of-concept using `long*` to store bit arrays. I doubt that in the end it would be any faster than using `mpz_mul_2exp` and friends.

Documentation of the cdef functions is still missing, this is what I intend to do next.

### comment:61 follow-up: ↓ 62 Changed 7 years ago by fredrik.johansson

If you use mpz functions, then getting/setting a single entry will be O(n) instead of O(1), and iterating over all entries or creating a sequence from a list of integers will be O(n2) instead of O(n). This makes the implementation rather useless for a lot of applications where a datatype like this would be useful (say if you want to store a million small integers in a memory-efficient way). With mpn functions, you can get the optimal complexity.

### comment:62 in reply to: ↑ 61 Changed 7 years ago by SimonKing

• Work issues changed from Document cdef functions to Document cdef functions. Pickling

If you use mpz functions, then getting/setting a single entry will be O(n) instead of O(1), and iterating over all entries or creating a sequence from a list of integers will be O(n2) instead of O(n).

Right, iteration, access to single items and creation from a list is not as fast as it could be. Can you point me to examples (in Sage? Elsewhere?) showeing how to use 'mpn_*`?

PS: I also forgot to implement pickling.

### comment:63 Changed 7 years ago by git

• Commit changed from efef80bea22840f618335b7dd6f5d3e9a64fa301 to 64ac7bab9d4c85136e9845878d69d82be7c300f8

Branch pushed to git repo; I updated commit sha1. New commits:

 ​64ac7ba `Pickling of bounded integer sequence. Documentation of cdef functions`

### comment:64 Changed 7 years ago by SimonKing

• Work issues changed from Document cdef functions. Pickling to Use mpn_* for speedup of iteration and item access

I have documented the cdef functions, and I have implemented pickling. Again (at least for long sequence), this is faster than for ordinary tuples:

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L = [randint(0,26) for i in range(5000)]
sage: S = BoundedIntegerSequence(32, L)
1000 loops, best of 3: 407 µs per loop
sage: T = tuple(S)
100 loops, best of 3: 2.02 ms per loop
```

I hope there is a nice example for `mpn_*` somewhere.

### comment:65 Changed 7 years ago by SimonKing

Hooray, using mpn_* improves the time to access item 2000 in an integer sequence of bound 32 from 1.49 µs to 545 ns. With tuples, the access time is 362 ns. So, that's almost competitive.

### comment:66 Changed 7 years ago by SimonKing

Another hooray: Conversion of a list of 5000 integers to a bounded integer sequence of bound 32 improved from 3.25 ms to 75.3 µs. Conversion to a tuple only takes 19.6 µs, but I doubt that this will be possible to beat.

### comment:67 Changed 7 years ago by git

• Commit changed from 64ac7bab9d4c85136e9845878d69d82be7c300f8 to 836f63fb96241033953e42d6d18becfaa4fe33b7

Branch pushed to git repo; I updated commit sha1. New commits:

 ​050b118 `Improve access to items of bounded integer sequences` ​c2e3547 `Improve conversion "list->bounded integer sequence"` ​836f63f `Improve iteration and list/string conversion of bounded integer sequences`

### comment:68 Changed 7 years ago by SimonKing

There remains to improve computation of the index of an item or of a sub-sequence, and slicing. When this is done, I'll provide updated benchmarks.

### comment:69 Changed 7 years ago by git

• Commit changed from 836f63fb96241033953e42d6d18becfaa4fe33b7 to c8a299ba54fa05d6851fd492019d9996e7e31515

Branch pushed to git repo; I updated commit sha1. New commits:

 ​c8a299b `More documentation of bounded integer sequences`

### comment:70 Changed 7 years ago by git

• Commit changed from c8a299ba54fa05d6851fd492019d9996e7e31515 to 735939e593c9c302ff42c4973cbfe2e48eb22644

Branch pushed to git repo; I updated commit sha1. New commits:

 ​775d795 `Improve index computation for bounded integer sequences` ​391a102 `Improve bounded integer subsequent containment test` ​735939e `Improve slicing of bounded integer sequences`

### comment:71 Changed 7 years ago by SimonKing

• Status changed from needs_work to needs_review

Using the `mpn_*` function was a very good idea. Some basic operations are a lot faster now. Here I am repeating (and slightly extending) my earlier benchmarks for those operations that have changed in the recent commits:

The test bed

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
```

Conversion list -> tuple/sequence

```sage: %timeit x = tuple(L0)
1000000 loops, best of 3: 307 ns per loop
sage: %timeit x = tuple(L1)
1000000 loops, best of 3: 369 ns per loop
sage: %timeit x = tuple(L2)
1000000 loops, best of 3: 534 ns per loop
sage: %timeit x = tuple(L3)
10000 loops, best of 3: 19.6 µs per loop
sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.34 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.57 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
100000 loops, best of 3: 2.06 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 76.7 µs per loop
```

Conversion to a tuple seems to be roughly 4 times faster. But I don't think this will be critical (at least not in my own applications...)

Conversion tuple/sequence -> list

```sage: %timeit x = list(T0)
1000000 loops, best of 3: 537 ns per loop
sage: %timeit x = list(T1)
1000000 loops, best of 3: 624 ns per loop
sage: %timeit x = list(T2)
1000000 loops, best of 3: 752 ns per loop
sage: %timeit x = list(T3)
100000 loops, best of 3: 22.6 µs per loop
sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.58 µs per loop
sage: %timeit x = list(S1)
100000 loops, best of 3: 2.06 µs per loop
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.19 µs per loop
sage: %timeit x = list(S3)
1000 loops, best of 3: 253 µs per loop
sage: %timeit x = S0.list()
1000000 loops, best of 3: 1.05 µs per loop
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.87 µs per loop
sage: %timeit x = S2.list()
100000 loops, best of 3: 4.95 µs per loop
sage: %timeit x = S3.list()
1000 loops, best of 3: 306 µs per loop
```

The gap between lists and sequences strongly depends on the bound of the sequence, which is not much of a surprise. Anyway, it is faster than before.

Slicing

For step 1, short tuples are faster than short sequences, but long sequences are faster than long tuples:

```sage: timeit("x=T0[:-1]", number=100000)
100000 loops, best of 3: 479 ns per loop
sage: timeit("x=T1[:-1]", number=100000)
100000 loops, best of 3: 548 ns per loop
sage: timeit("x=T2[:-1]", number=100000)
100000 loops, best of 3: 773 ns per loop
sage: timeit("x=T3[:-1]", number=100000)
100000 loops, best of 3: 19.6 µs per loop
sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 1.75 µs per loop
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 1.8 µs per loop
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 1.77 µs per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 2.4 µs per loop
```

As before, a step different from 1 is bad for sequences. However, there is an improvement compared with the previous timings:

```sage: timeit("x=T2[:-1:2]", number=100000)
100000 loops, best of 3: 944 ns per loop
sage: timeit("x=T3[:-1:2]", number=100000)
100000 loops, best of 3: 11.8 µs per loop
sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 2.7 µs per loop
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 52.2 µs per loop
```

Accessing single items

Bounded integer sequences can now compete with tuples, both short and long!

```sage: timeit("x=T0[1]", number=1000000)
1000000 loops, best of 3: 349 ns per loop
sage: timeit("x=T0[4]", number=1000000)
1000000 loops, best of 3: 351 ns per loop
sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 354 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 355 ns per loop
sage: timeit("x=T3[1]", number=1000000)
1000000 loops, best of 3: 346 ns per loop
sage: timeit("x=T3[4500]", number=1000000)
1000000 loops, best of 3: 347 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 350 ns per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 370 ns per loop
```

Containment tests

Here we talk about operations that do not exist in this form for tuples (e.g., subsequence tests). They have been improved by the latest commits:

```sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 230 ns per loop
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 236 ns per loop
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 511 ns per loop
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 215 ns per loop
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 219 ns per loop
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 692 ns per loop
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 238 ns per loop
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 228 ns per loop
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 1.92 µs per loop
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 981 ns per loop
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 213 ns per loop
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 2.41 ms per loop
```

Conclusion

• Slicing and iteration are still a weak point of bounded integer sequences. However, there is an improvement by the latest commits.
• Access to single items is now competitive. Some other operations have been competitive or superior before.
• Probably iteration and slicing could be improved with more complicated code: Currently, I take a single item, do a bitshift (which is faster than before!) and then proceed with the next item. But since several items usually fit into one "limb" (this is how GMP stores the data), it would be possible to accumulate a group of items into one limb, then do a single bitshift for this group of items, and then proceed to the next group of items. I would consider to implement it if my own applications show the need for it...

Needs review, I think! Please check if I produced a memory leak (by not freeing allocated temporary data), a memory corruption (by messing up in the case of items that are stored exactly on the seam between two limbs), or functions in which a memory error or keyboard interrupt would not be propagated. I plan to do such tests, too, but I suppose 4 or 6 eyes see more than 2.

### comment:72 Changed 7 years ago by SimonKing

• Status changed from needs_review to needs_work
• Work issues changed from Use mpn_* for speedup of iteration and item access to Fix flakyness in adding sequences

I found failures in some runs:

```king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx
Running doctests with ID 2014-06-03-13-33-10-7f6bcfa0.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1059, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__
Failed example:
S+T
Expected:
<4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 15>
Got:
<4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 31>
**********************************************************************
[178 tests, 1 failure, 0.79 s]
----------------------------------------------------------------------
sage -t src/sage/misc/bounded_integer_sequences.pyx  # 1 doctest failed
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
cpu time: 0.8 seconds
cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx
Running doctests with ID 2014-06-03-13-33-15-6cf37501.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
[178 tests, 0.78 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
cpu time: 0.8 seconds
cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx
Running doctests with ID 2014-06-03-13-33-20-d43d6691.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1061, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__add__
Failed example:
T+S
Expected:
<4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 9>
Got:
<4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 25>
**********************************************************************
[178 tests, 1 failure, 0.78 s]
----------------------------------------------------------------------
sage -t src/sage/misc/bounded_integer_sequences.pyx  # 1 doctest failed
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
cpu time: 0.8 seconds
cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx
Running doctests with ID 2014-06-03-13-33-25-04f71350.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
[178 tests, 0.78 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
cpu time: 0.8 seconds
cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx
Running doctests with ID 2014-06-03-13-33-29-a2a81517.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
[178 tests, 0.79 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
cpu time: 0.8 seconds
cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx
Running doctests with ID 2014-06-03-13-33-33-e89cc5fe.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
[178 tests, 0.81 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
cpu time: 0.8 seconds
cumulative wall time: 0.8 seconds
king@linux-etl7:~/Sage/git/sage> ./sage -t src/sage/misc/bounded_integer_sequences.pyx
Running doctests with ID 2014-06-03-13-33-38-83426cf0.
Doctesting 1 file.
sage -t src/sage/misc/bounded_integer_sequences.pyx
[178 tests, 0.79 s]
----------------------------------------------------------------------
All tests passed!
----------------------------------------------------------------------
Total time for all tests: 0.9 seconds
cpu time: 0.8 seconds
cumulative wall time: 0.8 seconds
```

My first guess is that this comes from using `sage_malloc` in some places, i.e., I should zero the memory first.

### comment:73 Changed 7 years ago by SimonKing

There is something seriously broken, and apparently in the last limb of stuff:

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: T = BoundedIntegerSequence(21, [4,1,6,2,8,15])
sage: S+T
<4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 15>
sage: T+S
<4, 1, 6, 2, 8, 15, 4, 1, 6, 2, 7, 20, 9>
sage: S+T
<4, 1, 6, 2, 7, 20, 9, 4, 1, 6, 2, 8, 31>
```

My current guess is that in some place I get the number of limbs wrong when I manually assign it to the underlying `mpz_t`.

### comment:74 Changed 7 years ago by SimonKing

Now I can explain what goes wrong in above example.

GMP has its limbs, and I have my sequence items. In the example, each item consumes 5 bit, but each limb comprises 32 bit. As it happens, the last bit of my last item is 0, and the last limb only stores this single zero bit. Hence, when I ask GMP to do high-level operations (which makes sense for concatenation, I wouldn't like to do it low-level), GMP tries to be clever and drops the last limb, since it is zero. But when I transform the result to a list, I want to access the last bit. But since the last bit is not in a limb any longer, it belongs to non-allocated memory. Hence, randomly, the last bit can be 0 or 1, which explains why in the example above the difference between the correct and the wrong last item is 16.

So, what I should do: When transforming a sequence to a string or list (or when iterating), I should check whether I am running outside of GMP's limbs. Alternatively, I should provide a bit 1 right behind of my last item, so that GMP will not drop the last limb (as I forced it to be non-zero).

### comment:75 Changed 7 years ago by SimonKing

Alternatively, I could replace all `mpz_*` by `mpn_*`. But that would be boring.

### comment:76 Changed 7 years ago by SimonKing

PS: Adding sequences that are constant zero exhibits the same problem.

### comment:77 Changed 7 years ago by SimonKing

I found examples that used to trigger the above-mentioned problems reliably. So, they will be added to the docs. I fixed the iteration problem, but I am still struggling with pickling.

### comment:78 Changed 7 years ago by SimonKing

Very odd. Could it be that I hit a bug in GMP in the conversion of an integer to a string representation at base `2^n`? Namely, in my examples, the 32-adic string representation used to pickle a bounded integer sequence is not representing the stored bit array, but the 31-adic or 10-adic string representation works fine (but probably is slower).

This seems to happen when the last limb used to store a GMP integer is in fact zero.

### comment:79 Changed 7 years ago by SimonKing

Yes, setting the `_mp_size` field of `__mpz_struct` in such a way that the limb `_mp_d[_mp_size-1]` is non-zero solves the problem. I thought GMP would cope with "superfluous" zeroes, but in string representation it does not.

So, I have to check where I have been lazy with the `_mp_size` field when using `mpn_*` functions.

### comment:80 Changed 7 years ago by SimonKing

Current status is a struggle with sub-sequence recognition in the presence of trailing zeroes.

Here is the type of examples that I propose:

• Use a sequence of length 7 with a bound of 32. Then, we need 35 bit to store the sequence.
• At least on my machine, a GMP limb comprises 32 bit. Hence, the above sequence fits into two limbs, we only use three bit of the second limb.
• Choose the last item of the sequence such that it fits into 2 bit. Then, the three bit stored in the second limb are zero.
• Now, do an operation that involves a `mpz_*` function to create a new bounded integer sequence, only adding trailing zeroes. This currently is in concatenation and slicing. This may result in GMP cutting off the trailing zeroes.

When we then do operations involving `mpn_*` functions, it is needed to take care of the cut off bits (and treat them as zero). My current private branch (not pushed yet) involves several tests that are failing with the branch currently attached here, but succeed in my current branch.

Still failing is this:

```            sage: X = BoundedIntegerSequence(21, [4,1,6,2,7,2,3])
sage: S = BoundedIntegerSequence(21, [0,0,0,0,0,0,0])
<4, 1, 6, 2, 7, 2, 3, 0, 0, 0, 0, 0, 0, 0>
True
sage: T = BoundedIntegerSequence(21, [0,4,0,1,0,6,0,2,0,7,0,2,0,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0])
sage: T[3::2]==(X+S)[1:]
True
sage: T[3::2] in X+S
False     # should return True!
```

EDIT: Recall that providing bound 21 will internally changed into the next power of two, hence, 32.

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:81 Changed 7 years ago by git

• Commit changed from 735939e593c9c302ff42c4973cbfe2e48eb22644 to c900a2cd4045aa3463be31d76c9f047b05571958

Branch pushed to git repo; I updated commit sha1. New commits:

 ​c900a2c `Take care of GMP's removal of trailing zeroes`

### comment:82 Changed 7 years ago by SimonKing

• Status changed from needs_work to needs_review
• Work issues Fix flakyness in adding sequences deleted

I solved the problems with wrong sizes / trailing zeroes. I added all tests that used to fail with previous versions as doc tests. All tests are passing now. Hence: Needs review again. Tomorrow, I'll provide new timings, as the new sanity checks may result in a small slow-down. We will see...

### comment:83 follow-up: ↓ 84 Changed 7 years ago by SimonKing

Repeating the tests, in each case (even when the underlying function has not changed) stating the timing of the old code:

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.4 µs per loop  # was 1.34 µs
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.59 µs per loop # was 1.57 µs
sage: %timeit x = BoundedIntegerSequence(32, L2)
100000 loops, best of 3: 2.13 µs per loop  # was 2.06 µs
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 76.3 µs per loop   # was 76.7 µs
```

It surprises me that some of the following became faster:

```sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.55 µs per loop # was 1.58 µs
sage: %timeit x = list(S1)
100000 loops, best of 3: 2.14 µs per loop  # was 2.06 µs
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.31 µs per loop  # was 4.19 µs
sage: %timeit x = list(S3)
1000 loops, best of 3: 274 µs per loop     # was 253 µs
sage: %timeit x = S0.list()
1000000 loops, best of 3: 810 ns per loop  # was 1.05 µs
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.22 µs per loop # was 1.87 µs
sage: %timeit x = S2.list()
100000 loops, best of 3: 2.9 µs per loop   # was 4.95 µs
sage: %timeit x = S3.list()
10000 loops, best of 3: 105 µs per loop    # was 306 µs
```
```sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 1.76 µs per loop  # was 1.75 µs
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 1.76 µs per loop  # was 1.8 µs
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 1.78 µs per loop  # was 1.77 µs
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 2.46 µs per loop  # was 2.4 µs
```
```sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 2.68 µs per loop   # was 2.7 µs
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 52.2 µs per loop   # was 52.2 µs
```

The following apparently became slower:

```sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 370 ns per loop  # was 354 ns
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 375 ns per loop  # was 355 ns
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 371 ns per loop  # was 350 ns
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 391 ns per loop  # was 370 ns
```
```sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 233 ns per loop  # was 230 ns
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 238 ns per loop  # was 236 ns
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 457 ns per loop  # was 511 ns
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 219 ns per loop  # was 215 ns
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 223 ns per loop  # was 219 ns
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 813 ns per loop  # was 692 ns
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 244 ns per loop  # was 238 ns
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 223 ns per loop  # was 228 ns
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 5.59 µs per loop # was 1.92 µs
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 990 ns per loop  # was 981 ns
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 216 ns per loop  # was 213 ns
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 2.33 ms per loop    # was 2.41 ms
```

So, overall, I think that the timings are still decent and it makes sense to use bounded integer sequences as replacement of tuples in some applications.

### comment:84 in reply to: ↑ 83 Changed 7 years ago by ncohen

So, overall, I think that the timings are still decent and it makes sense to use bounded integer sequences as replacement of tuples in some applications.

Especially since they are probably MUCH more compact in memory.

Nathann

### comment:85 Changed 7 years ago by git

• Commit changed from c900a2cd4045aa3463be31d76c9f047b05571958 to 1192c211965ea1dcf1fdbaeaec7e68bf61ba265e

Branch pushed to git repo; I updated commit sha1. New commits:

 ​1192c21 `Allow empty slices; bounded integer sequence -> bool`

### comment:86 Changed 7 years ago by SimonKing

I just fixed a segfault that has occurred for empty slices, and I implemented conversion sequence to bool (in the usual way: A sequence is nonzero if and only if it has a positive length). Still needs review...

### comment:87 Changed 7 years ago by SimonKing

• Status changed from needs_review to needs_work
• Work issues set to Fix other out of bound errors

Sigh. I hate trailing zeroes.

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: (BoundedIntegerSequence(21, [0,0]) + BoundedIntegerSequence(21, [0,0]))
<16, 0, 30, 30>
```

### comment:88 Changed 7 years ago by SimonKing

My original plan was to provide a function to compute the maximal overlap of sequences in the follow-up ticket #16453. However, while I am at fixing bugs, I'll move this new function to here.

### comment:89 Changed 7 years ago by git

• Commit changed from 1192c211965ea1dcf1fdbaeaec7e68bf61ba265e to ff4477a1600b44031c2fb962ff3c5ab24c110410

Branch pushed to git repo; I updated commit sha1. New commits:

 ​ff4477a `Remove biseq_to_str. Add max_overlap. Fix boundary violations`

### comment:90 Changed 7 years ago by SimonKing

• Status changed from needs_work to needs_review

Done!

Nathann, while I was at it, I have removed the `biseq_to_str` function, as we have discussed off-trac.

### comment:91 Changed 7 years ago by SimonKing

• Status changed from needs_review to needs_work
• Work issues changed from Fix other out of bound errors to Fix containement test

Again, there is trouble with zeroes. The sequence `<0>` is currently considered a subsequence of the sequence `<1,2,3>`:

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S1 = BoundedIntegerSequence(3,[1,3])
sage: S2 = BoundedIntegerSequence(3,[0])
sage: S2 in S1    # very wrong
True
sage: S1.startswith(S2) # correct
False
```

### comment:92 Changed 7 years ago by SimonKing

It seems that the problem lies in `mpz_congruent_2exp_p(A, B, nbits)`: In the problematic situation, `A==0`, `B==13` and `nbits==2`. It should be the case that the function returns "False", since 13 is not congruent to 0 modulo 4 (i.e. modulo two bits). But it doesn't. And that's totally weird. I'll try to reproduce it independent of the bounded integer sequences, i.e., I will see if it is a horribly GMP bug. Probably it isn't.

### comment:93 Changed 7 years ago by SimonKing

Yes, it isn't:

```sage: cython("""
....: include "sage/libs/ntl/decl.pxi"
....: def test(x,y):
....:     cdef mpz_t a,b
....:     mpz_init_set_ui(a, x)
....:     mpz_init_set_ui(b, y)
....:     print mpz_congruent_2exp_p(a,b,2)
....:     print mpz_get_ui(a), "vs", mpz_get_ui(b)
....:     print (<__mpz_struct*>a)._mp_size
....:     print (<__mpz_struct*>b)._mp_size
....:     mpz_clear(a)
....:     mpz_clear(b)
....: """)
sage: test(0,13)
False
0 vs 13
0
1
```

My current impression is that in the same situation as above, `mpz_congruent_2exp_p` gives the wrong result (`True`) in the bounded integer sequence code. Strange.

### comment:94 Changed 7 years ago by SimonKing

Argh, now I see the difference: By a bug, I somehow managed to give the number 13 the size zero. mpz_congruent_2exp_p thus believes that 13==0 and correctly concludes that 0 is congruent 13 modulo 4.

### comment:95 Changed 7 years ago by git

• Commit changed from ff4477a1600b44031c2fb962ff3c5ab24c110410 to 93546db866564a80d80df30de48934697e2b0d1d

Branch pushed to git repo; I updated commit sha1. New commits:

 ​fd1675d `Merge branch 'develop' into t/15820/ticket/15820` ​93546db `Fix subsequence containment test.`

### comment:96 Changed 7 years ago by SimonKing

• Status changed from needs_work to needs_review
• Work issues Fix containement test deleted

The problem is fixed. I inserted comments into the code that hopefully explain what happens in the subsequence containment test.

Doctests pass (including a new test that has previously failed), hence, it needs review again!

### comment:97 Changed 7 years ago by vbraun_spam

• Milestone changed from sage-6.3 to sage-6.4

### comment:98 follow-up: ↓ 99 Changed 7 years ago by chapoton

• Status changed from needs_review to needs_work

still two failing doctests, see patchbot report

### comment:99 in reply to: ↑ 98 Changed 7 years ago by SimonKing

still two failing doctests, see patchbot report

This is all very odd. All pathchbots report the same error, but I don't see it on my own machine. So far, I don't know where I could start debugging.

### comment:100 Changed 7 years ago by SimonKing

Is it perhaps the case that something relevant (GMP) was upgraded after sage-6.3.beta5? That's the version that I use on my laptop.

### comment:101 Changed 7 years ago by SimonKing

Currently I won't upgrade to sage-6.4, as I am rather busy with another project. Could one of you please test if the error appears in sage-6.4 but not in sage-6.3?

### comment:102 Changed 7 years ago by chapoton

I will test on 6.4.beta1. Building right now.

### comment:103 Changed 7 years ago by chapoton

I confirm the doctest failures on 6.4.beta1:

```sage -t --long src/sage/misc/bounded_integer_sequences.pyx
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1086, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__getitem__
Failed example:
S[1::2]
Expected:
<1, 2, 20>
Got:
<1, 6, 23>
**********************************************************************
File "src/sage/misc/bounded_integer_sequences.pyx", line 1088, in sage.misc.bounded_integer_sequences.BoundedIntegerSequence.__getitem__
Failed example:
S[-1::-2]
Expected:
<9, 7, 6, 4>
Got:
<9, 7, 22, 15>

```

### comment:104 Changed 7 years ago by SimonKing

Interesting!

So, what has changed between sage-6.3.b5 and sage-6.4.b1? Specifically, has something happened with GMP?

### comment:105 Changed 7 years ago by SimonKing

PS: Can you reproduce it on the command line?

At some point I should upgrade, to fix the issue...

### comment:106 Changed 7 years ago by chapoton

Yes, same behaviour in the command line.

### comment:107 Changed 7 years ago by SimonKing

I have upgraded now, and I do not see the error.

So, what could we do to trace the problem down?

### comment:108 Changed 7 years ago by SimonKing

Since I cannot reproduce the problem myself, could you please help me by providing some internal data? Perhaps there are different architectures (32bit or 64bit, big or little endian) at work?

Please add the following as a method of `BoundedIntegerSequence`:

```    def inspect(self):
cdef __mpz_struct seq
seq = deref(<__mpz_struct*>self.data.data)
print "bitsize",self.data.bitsize
print "itembitsize",self.data.itembitsize
print "length", self.data.length
print "GMP size", seq._mp_size
print "GMP alloc", seq._mp_alloc
print '.'.join([Integer(seq._mp_d[limb]).binary().rjust(mp_bits_per_limb,'0') for limb in range(seq._mp_size-1,-1,-1)])
```

With this, I get

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: S.inspect()
bitsize 35
itembitsize 5
length 7
GMP size 2
GMP alloc 3
00000000000000000000000000000010.01101000011100010001100000100100
sage: S[1::2]
<1, 2, 20>
sage: (S[1::2]).inspect()
bitsize 15
itembitsize 5
length 3
GMP size 1
GMP alloc 2
00000000000000000101000001000001
sage: S[-1::-2]
<9, 7, 6, 4>
sage: (S[-1::-2]).inspect()
bitsize 20
itembitsize 5
length 4
GMP size 1
GMP alloc 2
00000000000000100001100011101001
```

### comment:109 Changed 7 years ago by chapoton

got that :

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: S = BoundedIntegerSequence(21, [4,1,6,2,7,20,9])
sage: S.inspect()
bitsize 35
itembitsize 5
length 7
GMP size 1
GMP alloc 2
0000000000000000000000000000001001101000011100010001100000100100
sage: S[1::2]
<1, 6, 23>
sage: (S[1::2]).inspect()
bitsize 15
itembitsize 5
length 3
GMP size 1
GMP alloc 2
0000000000000000000000000000000000010011110111111101110011000001
sage: S[-1::-2]
<9, 7, 22, 15>
sage: (S[-1::-2]).inspect()
bitsize 20
itembitsize 5
length 4
GMP size 1
GMP alloc 2
0000000000000001001101000011101011101100011101111101100011101001
```

### comment:110 Changed 7 years ago by SimonKing

Thank you! So, one obvious difference is that on your computer, GMP uses chunks of 64 bit to store long integers, whereas it is only 32 bit on my machine (even though it is a 64 bit CPU). I'll see if I can make sense of the error.

### comment:111 Changed 7 years ago by SimonKing

Another interesting point is that the single limb used to store `S[1::2]` resp. `S[-1::-2]` on your machine is full of junk bits.

Anyway, what surprises me is the following: Up to know, all problems arose because of a misfit where two limbs meet. But here, for the first time, we have errors occurring in a single limb.

### comment:112 Changed 7 years ago by SimonKing

Hooray, I can reproduce the problem on my machine, with a different example!

The key is (as on your machine) to create a sequence that fits inside a single limb, an then slice:

```sage: S = BoundedIntegerSequence(8, [4,1,6,2,7,2,5,5,2])
sage: S
<4, 1, 6, 2, 7, 2, 5, 5, 2>
sage: S[1::2]
<1, 6, 7, 7>
sage: S[-1::-2]
<2, 5, 7, 6, 7>
```

The sequence above is of length 9, with items fitting into 3 bit. Hence, 27 bit are used, whereas one limb comprises 32 bits.

So, stuff for debugging!

### comment:113 Changed 7 years ago by SimonKing

Got it!

In some branches of the `slice_biseq` function, I forgot to restrict the to-be-copied data down to the bitsize of items (i.e., I forgot `...&S.mask_item`). This oversight only happened in branches dealing with the case that everything fits within a single limb. Hence, no error in the old doctest for me (because it comprises two limbs on my machine), but an error for you (because it comprises a single limb on your machine).

### comment:114 Changed 7 years ago by git

• Commit changed from 93546db866564a80d80df30de48934697e2b0d1d to c69c67ccf3093a3c947ebbe83d5224daf695c3f9

Branch pushed to git repo; I updated commit sha1. New commits:

 ​3b2f713 `Merge branch 'develop' into t/15820/ticket/15820` ​c69c67c `Use a bitmask when slicing bounded integer sequences`

### comment:115 Changed 7 years ago by SimonKing

• Status changed from needs_work to needs_review

The fix is pushed, and of course the example that was failing on my machine has become a new doctest.

### comment:116 Changed 7 years ago by git

• Commit changed from c69c67ccf3093a3c947ebbe83d5224daf695c3f9 to b331dc4f12e7291e9537f607ed155085cabc3fcd

Branch pushed to git repo; I updated commit sha1. New commits:

 ​b331dc4 `Fix mem leak converting zero-valued biseq_t to list`

### comment:117 Changed 7 years ago by SimonKing

I have fixed a memory leak: In the function `biseq_to_list`, some temporary buffer was allocated, but in the case that the sequence is zero valued, the function returns without freeing the buffer. Solution: Only allocate the buffer after checking that the sequence is not zero valued.

Still needing review (hint-hint)...

### comment:118 Changed 7 years ago by SimonKing

• Status changed from needs_review to needs_work
• Work issues set to memory corruption

I found that the following function (written into `bouned_integer_sequences.pyx`) achieves to crash sage:

```def _biseq_stresstest():
cdef int i
from sage.misc.prandom import randint
cdef list L = [BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))]) for y in range(100)]
cdef int branch
cdef BoundedIntegerSequence S, T
for i from 0<=i<10000:
if (i%1000) == 0:
print i
branch = randint(0,4)
if branch == 0:
L[randint(0,99)] = L[randint(0,99)]+L[randint(0,99)]
elif branch == 1:
x = randint(0,99)
if len(L[x]):
y = randint(0,len(L[x])-1)
z = randint(y,len(L[x])-1)
L[randint(0,99)] = L[x][y:z]
else:
L[x] = BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))])
elif branch == 2:
t = list(L[randint(0,99)])
t = repr(L[randint(0,99)])
t = L[randint(0,99)].list()
elif branch == 3:
x = randint(0,99)
if len(L[x]):
t = L[x][randint(0,len(L[x])-1)]
try:
t = L[x].index(t)
except ValueError:
print L[x]
print t
raise
else:
L[x] = BoundedIntegerSequence(6, [randint(0,5) for x in range(randint(4,10))])
elif branch == 4:
S = L[randint(0,99)]
T = L[randint(0,99)]
startswith_biseq(S.data,T.data)
contains_biseq(S.data, T.data, 0)
max_overlap_biseq(S.data, T.data)
```

Can someone please see what happens on other machines? On mine (with the above function, but also with something that was supposed to be the corner stone in my current project), crashes occur in `sage_free` or `mpz_clear`. Surprisingly, the variables-to-be-freed/cleared should be just fine, in some cases they haven't even been touched after allocation.

My wild guess is that using `sage_malloc` (in `bounded_integer_sequences.pyx`) somehow interferes with gmp's (mpir's?) memory management. Even if the reason is different, I suppose it would make sense to get rid of `sage_malloc` and `sage_free` in my code. After all, I use it for buffers whose length is known at compile time. Hence, I should better use arrays.

So, my plan is: Use arrays instead of `sage_malloc`, and hope that this is enough to let above function pass, which would then become a doctest.

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:119 follow-up: ↓ 120 Changed 7 years ago by SimonKing

Hmm. Using, for example,

```    cdef mp_limb_t tmp_limb[1]
```

```    cdef mp_limb_t *tmp_limb
tmp_limb = <mp_limb_t*>sage_malloc(sizeof(mp_limb_t))
```

compiles, but then it seems that the `mpn_*` functions do not correctly put values into the arrays, when I give them `tmp_limb` as argument.

Question to C/Cython experts: Is `tmp_limb` not just of type `<mp_limb_t*>` when I define `cdef mp_limb_t tmp_limb[1]`?

### comment:120 in reply to: ↑ 119 ; follow-up: ↓ 121 Changed 7 years ago by SimonKing

Question to C/Cython experts: Is `tmp_limb` not just of type `<mp_limb_t*>` when I define `cdef mp_limb_t tmp_limb[1]`?

Forget the question. The problem was somewhere else. Now I think I can make arrays works as replacement for the current usage of `sage_malloc`.

### comment:121 in reply to: ↑ 120 Changed 7 years ago by SimonKing

Question to C/Cython experts: Is `tmp_limb` not just of type `<mp_limb_t*>` when I define `cdef mp_limb_t tmp_limb[1]`?

Forget the question. The problem was somewhere else. Now I think I can make arrays works as replacement for the current usage of `sage_malloc`.

Do not forget the question. There is something extremely strange going on. When I insert a print statement after assigning the size of an `__mpz_struct*`, then things work. Without the print statement, the assignment of the size is ignored and the value incorrectly left zero.

So, how can a print statement influence whether or not some value of a variable in GMP is assigned?

### comment:122 Changed 7 years ago by git

• Commit changed from b331dc4f12e7291e9537f607ed155085cabc3fcd to 7b53dc82d032d5a787e10731938f9048b620031b

Branch pushed to git repo; I updated commit sha1. New commits:

 ​7b53dc8 `Try to improve memory management for biseq_t, and add a stress test`

### comment:123 Changed 7 years ago by SimonKing

I have replaced `sage_malloc` by arrays, since the array size is known at compile time. I think this is a good idea anyway, and the doctests pass.

But not all is good, and here I need help: `_biseq_stresstest()` either crashes with `free(): invalid next size (fast)`, or it raises a value error stating that some element is not in a bounded integer sequence, even though by construction it is contained. So, memory management still goes havoc.

Any clues or pointers?

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:124 Changed 7 years ago by SimonKing

I was running it under valgrind---and as a result, I did not see a crash! However, the other problem (a value error when a sequence is supposed to return the index of one of its elements) still persists under valgrind.

Concerning the crash: What can one do to further debug if a crash can not be reproduced with valgrind?

### comment:125 Changed 7 years ago by SimonKing

PS: I can not quit sage when I run it under valgrind---"quit" has no effect!

### comment:126 Changed 7 years ago by SimonKing

Even "pkill python" did not achieve to finish the valgrinded Sage session. Boy, am I fed up now. Time to call it a day.

### comment:127 Changed 7 years ago by SimonKing

Actually I realise that I made a severe mistake all the time. `concat_biseq` returns a pointer to a local variable, and I did so in order to make it possible to propagate errors (by saying `... except NULL`). But returning a pointer to a local variable can not possibly work.

Since I want to propagate errors, I suppose it will be better to consistently work with pointers. I'll rewrite the module accordingly...

### comment:128 Changed 7 years ago by SimonKing

It seems that that was the reason indeed! After rewriting everything so that `biseq_t` becomes a pointer to a struct (before, `biseq_t` was the struct), all doctests pass, and the stresstest doesn't crash.

But not all is good: The stresstest doesn't crash, but sporadically it finds can not find the index of an element of a sequence. So, needs work and I will not push yet, but it is almost done.

### comment:129 Changed 7 years ago by SimonKing

The stresstest found that the following fails:

```            sage: S = BoundedIntegerSequence(6, [3, 5, 3, 1, 5, 2, 2, 5, 3, 3, 4])
sage: S[-1]
4
```

Currently it returns 0.

### comment:130 Changed 7 years ago by git

• Commit changed from 7b53dc82d032d5a787e10731938f9048b620031b to 0aa3cbcfb1be64154ad32a419f250a9c6e52dcf2

Branch pushed to git repo; I updated commit sha1. New commits:

 ​7c9f0f2 `Biseq refactored, so that only pointers are passed around.` ​0aa3cbc `Fix corner cases in item access for bounded integer sequences`

### comment:131 Changed 7 years ago by SimonKing

The stress test is really valuable. I found several small problems concerning corner cases. And, as I have announced, I have reworked the code so that only "proper" pointers rather than stupid references to local variables are passed around.

There remain problems, though. In the doctests, I have commented out the stresstest, because there are STILL crashes. This time, `mpz_clear` sometimes doesn't work. My wild guess is that at some point I write out of bounds when using `mpn_*`-functions.

New commits:

 ​7c9f0f2 `Biseq refactored, so that only pointers are passed around.` ​0aa3cbc `Fix corner cases in item access for bounded integer sequences`

### comment:132 Changed 7 years ago by SimonKing

Hooray, by inserting explicit range checks, I found that I am indeed writing out of allocated area in `slice_biseq`. That (and similar errors?) will hopefully explain the crashes...

### comment:133 Changed 7 years ago by git

• Commit changed from 0aa3cbcfb1be64154ad32a419f250a9c6e52dcf2 to f20dc096832f260edee2f28b3007bf42263f54ac

Branch pushed to git repo; I updated commit sha1. New commits:

 ​f20dc09 `Fix writing out of bounds, and assert that the bounds are respected`

### comment:134 follow-up: ↓ 135 Changed 7 years ago by SimonKing

• Status changed from needs_work to needs_review

In the new commit, I added assertions to make sure that I am not writing into non-allocated memory. So, the code should now be water proof in that regard.

All tests pass, including the "stress test". So, it can be reviewed now!

Question to the reviewer: Can the assertions be removed? After all, what is being asserted is just that some variables are correctly computed, but this can in principle be checked by reading the code. Or are assertions (even in relatively tight loops) cheap enough?

### comment:135 in reply to: ↑ 134 Changed 7 years ago by nthiery

Question to the reviewer: Can the assertions be removed? After all, what is being asserted is just that some variables are correctly computed, but this can in principle be checked by reading the code. Or are assertions (even in relatively tight loops) cheap enough?

If you run `Python` with the option `-O`, all the assert calls are ignored. Something similar should be doable, at compile time of course, for Cython.

Most programming languages support this to encourage developpers to never remove asserts. We really should have this feature in Sage. A good starting point would be to have `sage -O` call `Python -O`.

Cheers,

Nicolas

### comment:136 Changed 7 years ago by git

• Commit changed from f20dc096832f260edee2f28b3007bf42263f54ac to cea38bb0a7d6f748cfc1feef357df5e22eb7b652

Branch pushed to git repo; I updated commit sha1. New commits:

 ​cea38bb `Change doc according to the changed functionality of list_to_biseq`

### comment:137 Changed 7 years ago by SimonKing

• Work issues memory corruption deleted

### comment:138 Changed 7 years ago by git

• Commit changed from cea38bb0a7d6f748cfc1feef357df5e22eb7b652 to 4611f8aa591616f118b50ebc2d2516c635e40318

Branch pushed to git repo; I updated commit sha1. New commits:

 ​4611f8a `Minor changes in the docs`

### comment:139 Changed 7 years ago by git

• Commit changed from 4611f8aa591616f118b50ebc2d2516c635e40318 to 815d77c77f48605ea33b87a88120aaabadfb5f7f

Branch pushed to git repo; I updated commit sha1. New commits:

 ​815d77c `mpn_r/lshift should only be used with strictly positive shift`

### comment:140 Changed 7 years ago by SimonKing

This is to explain the most recent commit: By doing some non-commutative Gröbner basis computations relying on "biseq_t", I obtained crashes apparently caused by the `slice_biseq` function. It is using `mpn_rshift`, and it did so even when the shift was zero. However, I found in the docs that the shift is supposed to be at least 1. Hence, when there is no shift, I am now using `mpn_copyi` instead. Hope it helps to prevent the crash...

### comment:141 Changed 7 years ago by SimonKing

Argh. No, it did not fix the crash in my Gröbner basis application. Anyway, I still think the commit makes sense.

### comment:142 Changed 7 years ago by git

• Commit changed from 815d77c77f48605ea33b87a88120aaabadfb5f7f to 6dfb1cb1c6dc5704e2a8918cf2aeabdd8dbef69a

Branch pushed to git repo; I updated commit sha1. New commits:

 ​6dfb1cb `Make contains_biseq interruptible and add to its doc`

### comment:143 follow-up: ↓ 144 Changed 7 years ago by vbraun

I'm getting three valgrind warnings (#15586):

```sage: sage: from sage.misc.bounded_integer_sequences import _biseq_stresstest
sage: _biseq_stresstest()
==3937== Invalid read of size 8
==3937==    at 0x30A524B0: __gmpn_rshift (rshift.c:55)
==3937==    by 0x48CCB50B: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2611)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]
==3937==  Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd
==3937==    at 0x4C29BCF: malloc (vg_replace_malloc.c:296)
==3937==    by 0x142EC8FD: sage_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x142EC9E4: sage_gmp_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x30A439EE: __gmpz_init2 (init2.c:33)
==3937==    by 0x48CCE302: __pyx_f_4sage_4misc_25bounded_integer_sequences_concat_biseq (bounded_integer_sequences.c:2256)
==3937==    by 0x4E80BC8: binary_op1 (abstract.c:945)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]
==3937== Conditional jump or move depends on uninitialised value(s)
==3937==    at 0x48CCB5C3: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2828)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
==3937==    by 0x4F3F24F: PyEval_EvalCodeEx (ceval.c:3265)
==3937==    by 0x4F3F378: PyEval_EvalCode (ceval.c:673)
```

The middle one is almost certainly an error, the other two don't look too good either.

### comment:144 in reply to: ↑ 143 Changed 7 years ago by SimonKing

Good that I added the stress test!

I'm getting three valgrind warnings (#15586):

```sage: sage: from sage.misc.bounded_integer_sequences import _biseq_stresstest
sage: _biseq_stresstest()
==3937== Invalid read of size 8
==3937==    at 0x30A524B0: __gmpn_rshift (rshift.c:55)
==3937==    by 0x48CCB50B: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2611)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]
==3937==  Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd
==3937==    at 0x4C29BCF: malloc (vg_replace_malloc.c:296)
==3937==    by 0x142EC8FD: sage_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x142EC9E4: sage_gmp_malloc (in /home/vbraun/Sage/git-develop/local/lib/libcsage.so)
==3937==    by 0x30A439EE: __gmpz_init2 (init2.c:33)
==3937==    by 0x48CCE302: __pyx_f_4sage_4misc_25bounded_integer_sequences_concat_biseq (bounded_integer_sequences.c:2256)
==3937==    by 0x4E80BC8: binary_op1 (abstract.c:945)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
[...]
```

Do I understand correctly: It means that `contains_biseq` could read and `concat_biseq` could write stuff in non-allocated memory?

```==3937== Conditional jump or move depends on uninitialised value(s)
==3937==    at 0x48CCB5C3: __pyx_f_4sage_4misc_25bounded_integer_sequences_contains_biseq (bounded_integer_sequences.c:2828)
==3937==    by 0x48CD388A: __pyx_pf_4sage_4misc_25bounded_integer_sequences_2_biseq_stresstest (bounded_integer_sequences.c:8902)
==3937==    by 0x48CD388A: __pyx_pw_4sage_4misc_25bounded_integer_sequences_3_biseq_stresstest (bounded_integer_sequences.c:8038)
==3937==    by 0x4F3E768: call_function (ceval.c:4017)
==3937==    by 0x4F3E768: PyEval_EvalFrameEx (ceval.c:2679)
==3937==    by 0x4F3F24F: PyEval_EvalCodeEx (ceval.c:3265)
==3937==    by 0x4F3F378: PyEval_EvalCode (ceval.c:673)
```

The middle one is almost certainly an error, the other two don't look too good either.

Will the given line numbers (2611, 2256, 2828) be the same on my machine?

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:145 follow-up: ↓ 152 Changed 7 years ago by vbraun

I think they are, but I wouldn't bet anything on it ;-)

### comment:146 follow-up: ↓ 147 Changed 7 years ago by vbraun

1) is `contains_biseq` reading into unallocated memory, and 2) says that the invalid read is just beyond what `concat_biseq` allocated.

### comment:147 in reply to: ↑ 146 Changed 7 years ago by SimonKing

1) is `contains_biseq` reading into unallocated memory, and 2) says that the invalid read is just beyond what `concat_biseq` allocated.

Sigh. Could be that I even don't see where 1), 2) and 3) are in the valgrind log...

Concerning the invalid read of size 8 by `__gmpn_rshift`: I think I recall that at some point I wanted to avoid considering cases and thus allowed to rshift memory that partially is uninitialised, since after the shift I would only access the bytes that were allocated. Probably this is improper and should be fixed.

Concerning `Address 0xecfe560 is 0 bytes after a block of size 16 alloc'd`: I thought this means I am writing into non-allocated memory. But by your new comment, I understand that this just indicates where the memory accessed by the previous item (`__gmpn_rshift`) was allocated. I guess I need an explanation.

Concerning `Conditional jump or move depends on uninitialised value(s)`: Is this independent of the two previous points? Or does it concern the same example and the same memory location?

### comment:148 follow-up: ↓ 149 Changed 7 years ago by vbraun

With the updated valgrind they are the only errors while running the testsuite(!). Trivial to spot. Its possible that they are false positives if you use only certain bit positions, but zeroing out the memory doesn't cost much in terms of runtime and can save us a lot of work if your calculations about bit patterns are off by one ;-)

### comment:149 in reply to: ↑ 148 Changed 7 years ago by SimonKing

With the updated valgrind they are the only errors while running the testsuite(!).

Wow, that's excellent news (except for my code `:-/`)!

Trivial to spot. Its possible that they are false positives if you use only certain bit positions, but zeroing out the memory doesn't cost much in terms of runtime and can save us a lot of work if your calculations about bit patterns are off by one ;-)

True. Anyway, to fix it I probably need answers to my questions from comment:147 in order to know exactly what to do.

### comment:150 Changed 7 years ago by vbraun

In the first one you are reading past the end of the buffer from unallocated memory. Either you need to allocate enough (round up) or also handle the cases where you have less than a full 8 bytes. Even though its highly unlikely, reading past the end could even result in a segfault if the process space happens to end there.

### comment:151 Changed 7 years ago by fredrik.johansson

If it's an "invalid read of size 8" and "0 bytes after a block" in mpn_copyi, chances are it's not a bug. GMP/MPIR will sometimes read 16 aligned bytes at a time when there are just 8 bytes left to copy, on certain architectures where this is safe. This might be the case in mpn_rshift too.

If you're absolutely sure that your code should be correct there, you could try compiling MPIR with assembly optimizations disabled and see if the error goes away. Or easier, temporarily replace the mpn_rshift in your code with a plain C implementation (e.g. the one in mpn/generic/rshift.c in MPIR).

The "Conditional jump or move depends on uninitialised value" is absolutely a bug.

### comment:152 in reply to: ↑ 145 Changed 7 years ago by SimonKing

I think they (the line numbers, SK) are (the same on different machines, SK), but I wouldn't bet anything on it ;-)

Hmm. The conditional jump occurs in line 2828 of the c-file, but this is:

```  /* "sage/misc/bounded_integer_sequences.pyx":398
*             n += S1.itembitsize
*     mpz_clear(tmp)
*     return -1             # <<<<<<<<<<<<<<
*
* cdef int index_biseq(biseq_t S, int item, size_t start) except -2:
*/   #### Here is 2828
__pyx_r = -1;
goto __pyx_L0;
```

Hm. Volker, could you provide the location of the conditional jump that is giving you the warning?

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:153 Changed 7 years ago by SimonKing

• Status changed from needs_review to needs_work
• Work issues set to Rework by using Sage's existing bitset code

I had a chat with Nathann today. He asked me about the data structure used by GMP, which is mainly `long*` (aka "limb"), plus some information on the number of limbs used and the number of limbs allocated. `long*` is also used by `sage.misc.bitset`.

What I am implementing here is in fact quite similar to bitsets (one item of a bounded integer sequence corresponds to a chunk of bits, but otherwise it is the same). So, he suggested to not duplicate code, but make mutual use.

I have looked at the bitset code in the past, but apparently it was long before I implemented bounded integer sequences: Now I realize that much of what I do here is actually done in `sage.misc.bitset`! So, definitely I should reuse the code. Hence, marking it as "needs work" for now, as the code will totally change.

Suggestion: I'll add to `sage.misc.bitset` what is not already there, and then `sage.misc.bounded_integer_sequence` will not provide all the boilerplate code, but only provide the Python layer to comfortably use bounded integer sequences.

### comment:154 Changed 7 years ago by SimonKing

I do need a bit more for `biseq_t` than what is provided by `bitset_t`. For example, I need to store the information how many bits are required to store one item on the list. But anyway, I will now try to rewrite my code.

### comment:155 Changed 7 years ago by git

• Commit changed from 6dfb1cb1c6dc5704e2a8918cf2aeabdd8dbef69a to 47c639d55d3c07670b8b52a629f7cf86a8beb7ce

Branch pushed to git repo; I updated commit sha1. New commits:

 ​bfd7898 `Merge branch 'develop' into t/15820/ticket/15820, since apparently the bitset code changed` ​47c639d `Rewrite bounded integer sequences using sage.misc.bitset`

### comment:156 Changed 7 years ago by SimonKing

• Status changed from needs_work to needs_review
• Work issues Rework by using Sage's existing bitset code deleted

By suggestion of Nathann, I am now reusing part of the existing code from `sage.misc.bitset`. Main advantage is the fact that (in contrast to the `mpz_*` functions) leading zeros will not be silently removed, which means that I could avoid many special cases.

Hopefully Nathann will find the new version of the code a lot less frightening than the old version...

Needs review again, and soonish I will update the timings that have occasionally been mentioned in comments above.

### comment:157 Changed 7 years ago by SimonKing

Let us compare the performance tests from comment:83 (also adding tests for pickling) with the new code version (I am testing on the same laptop, thus, hopefully a comparison makes sense):

```sage: from sage.misc.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.15 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.35 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
1000000 loops, best of 3: 1.88 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 70.5 µs per loop
```

--> Became all faster

```sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.43 µs per loop
sage: %timeit x = list(S1)
100000 loops, best of 3: 1.84 µs per loop
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.41 µs per loop
sage: %timeit x = list(S3)
1000 loops, best of 3: 304 µs per loop
sage: %timeit x = S0.list()
1000000 loops, best of 3: 666 ns per loop
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.2 µs per loop
sage: %timeit x = S2.list()
100000 loops, best of 3: 2.77 µs per loop
sage: %timeit x = S3.list()
10000 loops, best of 3: 112 µs per loop
```

--> Became faster at least for short sequences, which is the case I am mainly interested in

```sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 1.49 µs per loop
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 1.52 µs per loop
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 1.52 µs per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 2.29 µs per loop
```

--> Became all faster

```sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 2.5 µs per loop
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 62.2 µs per loop
```

--> Became faster for the shorter sequence

```sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 340 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 339 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 339 ns per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 345 ns per loop
```

--> Became all faster

```sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S1y==S1!=S1x, S1y is not S1
(True, True)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0z2.startswith(S0)", number=1000000)
1000000 loops, best of 3: 210 ns per loop
sage: timeit("S0z2.startswith(S0x)", number=1000000)
1000000 loops, best of 3: 211 ns per loop
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 368 ns per loop
sage: timeit("S1z2.startswith(S1)", number=1000000)
1000000 loops, best of 3: 200 ns per loop
sage: timeit("S1z2.startswith(S1x)", number=1000000)
1000000 loops, best of 3: 199 ns per loop
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 542 ns per loop
sage: timeit("S2z2.startswith(S2)", number=1000000)
1000000 loops, best of 3: 219 ns per loop
sage: timeit("S2z2.startswith(S2x)", number=1000000)
1000000 loops, best of 3: 205 ns per loop
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 2.33 µs per loop
sage: timeit("S3z2.startswith(S3)", number=1000000)
1000000 loops, best of 3: 963 ns per loop
sage: timeit("S3z2.startswith(S3x)", number=1000000)
1000000 loops, best of 3: 201 ns per loop
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 2.36 ms per loop
```

--> All but the last became faster, which is excellent for my applications, as I am using subsequence containment tests fairly often.

Additional timings for pickling, comparing Python lists with bounded integer sequences:

```sage: %timeit loads(dumps(L0))
10000 loops, best of 3: 39.5 µs per loop
10000 loops, best of 3: 43.9 µs per loop
10000 loops, best of 3: 65.5 µs per loop
100 loops, best of 3: 2.13 ms per loop
10000 loops, best of 3: 68.7 µs per loop
10000 loops, best of 3: 72.1 µs per loop
10000 loops, best of 3: 87 µs per loop
1000 loops, best of 3: 1.06 ms per loop
```

--> Pickling of short bounded integer sequences has an overhead, but longer sequences are pickled faster, due to the more compact representation.

I think that one can say that the new code version results in an overall performance gain, and I suppose it is less frightening now.

Last edited 7 years ago by SimonKing (previous) (diff)

### comment:158 Changed 7 years ago by git

• Commit changed from 47c639d55d3c07670b8b52a629f7cf86a8beb7ce to e3260e2d9f5bf0483770cc5235c00cd24c4eb310

Branch pushed to git repo; I updated commit sha1. New commits:

 ​e3260e2 `Typographical improvements`

### comment:159 Changed 7 years ago by git

• Commit changed from e3260e2d9f5bf0483770cc5235c00cd24c4eb310 to aac3a049805ab29aea8a0d3439c1eadf6605eab1

Branch pushed to git repo; I updated commit sha1. New commits:

 ​aac3a04 `Fix cornercase in unpickling`

### comment:160 Changed 7 years ago by git

• Commit changed from aac3a049805ab29aea8a0d3439c1eadf6605eab1 to b182ba27bdaeb0f066daf1b575db3588b3195f89

Branch pushed to git repo; I updated commit sha1. New commits:

 ​b182ba2 `Use mpn_l/rshift only with nonzero shift`

### comment:161 follow-up: ↓ 164 Changed 6 years ago by jdemeyer

Just randomly looking through this patch...

I don't like to put this in `src/sage/misc`. There is already too much stuff there. I think we need an additional top-level directory for basic data structures like this (bitset also should be in there). What do you think of `src/sage/data_structures`?

### comment:162 follow-up: ↓ 165 Changed 6 years ago by jdemeyer

Also, instead of `allocate_biseq()`, I would use `biseq_init()` to be more analogous to `mpz_init()` and `bitset_init()`. More generally, replace `foo_biseq()` by `biseq_foo()`.

### comment:163 follow-up: ↓ 166 Changed 6 years ago by jdemeyer

Please don't `cdef extern from "gmp.h"`, use `from sage.libs.gmp.mpn cimport *` or something.

### comment:164 in reply to: ↑ 161 ; follow-up: ↓ 169 Changed 6 years ago by SimonKing

Just randomly looking through this patch...

I don't like to put this in `src/sage/misc`. There is already too much stuff there. I think we need an additional top-level directory for basic data structures like this (bitset also should be in there). What do you think of `src/sage/data_structures`?

I think this question has been discussed on another ticket that I authored. My impression from the discussion was that "structural" stuff should either go to `sage/structure` or `sage/misc`. It should be the former, if it is about structures that only make sense in Sage (Parents, elements, ...), whereas `sage/misc` is for structure that would also make sense without Sage. I thought that bounded integer sequences belong to the latter, and I would also think that `TripleDict` and `MonoDict` should better be in `sage/misc` (the only reason for them being in `sage/structure` is the fact that they were introduced for coercion, which is Sage specific).

So, I think `sage/misc` is a good place. But I wouldn't mind to introduce a third structural module, `sage/data_structures`. On a different ticket?

### comment:165 in reply to: ↑ 162 Changed 6 years ago by SimonKing

Also, instead of `allocate_biseq()`, I would use `biseq_init()` to be more analogous to `mpz_init()` and `bitset_init()`. More generally, replace `foo_biseq()` by `biseq_foo()`.

Ok, makes sense.

### comment:166 in reply to: ↑ 163 ; follow-up: ↓ 167 Changed 6 years ago by SimonKing

Please don't `cdef extern from "gmp.h"`, use `from sage.libs.gmp.mpn cimport *` or something.

Is it available from there? If it is, then I should certainly do it in that way.

### comment:167 in reply to: ↑ 166 ; follow-up: ↓ 170 Changed 6 years ago by SimonKing

Please don't `cdef extern from "gmp.h"`, use `from sage.libs.gmp.mpn cimport *` or something.

Is it available from there? If it is, then I should certainly do it in that way.

`mp_bits_per_limb` is not available from there, hence, I still import it from gmp.h.

For now I will not introduce a new `sage.data_structures`.

### comment:168 Changed 6 years ago by git

• Commit changed from b182ba27bdaeb0f066daf1b575db3588b3195f89 to 1f5a53e7b23e406c553d789f83604cd4a7f90655

Branch pushed to git repo; I updated commit sha1. New commits:

 ​1f5a53e `Change the naming schemes of functions according to conventions in gmp and bitset`

### comment:169 in reply to: ↑ 164 ; follow-ups: ↓ 176 ↓ 177 Changed 6 years ago by jdemeyer

On a different ticket?

It makes no sense to first introduce a new module on this ticket and then make a new ticket to immediately move that new module somewhere else (requiring changing import statements and everything). If it were an existing module, things would be different.

The problem with `src/sage/misc` is that it contains a lot of various things. Usually "misc" is used when something doesn't belong to any other category. But it's clear that there are a lot of modules defining data structures in `misc` (but also other stuff like `temporary_file`). I think it makes sense to group these in `src/sage/data_structures` (or another name, whatever) instead of dumping everything in `misc`.

### comment:170 in reply to: ↑ 167 Changed 6 years ago by jdemeyer

`mp_bits_per_limb` is not available from there, hence, I still import it from gmp.h.

No, do cimport it from `sage.libs.gmp.types` and add the declaration there.

### comment:171 Changed 6 years ago by jdemeyer

This documentation of `ctypedef struct biseq` should be expanded. When reading it, it is really not clear to me what the various fields mean (especially `"bla&mask_item" greps onethe item bla starts with` makes no sense) and how the data structure works. If you're trying too hard to get everything one line, use

```ctypedef struct biseq:
bitset_t data

mp_bitcnt_t itembitsize
...
```
Last edited 6 years ago by jdemeyer (previous) (diff)

### comment:172 Changed 6 years ago by jdemeyer

• Status changed from needs_review to needs_work

### comment:173 Changed 6 years ago by jdemeyer

`first_bits_equal` -> `biseq_first_bits_equal`

### comment:174 follow-up: ↓ 178 Changed 6 years ago by jdemeyer

Also: I think you use `int` in a lot of places where you really want a `long` or `mp_limb_t` or `mp_size_t` or `Py_ssize_t`.

Last edited 6 years ago by jdemeyer (previous) (diff)

### comment:175 Changed 6 years ago by jdemeyer

`list_to_biseq` seems to leak memory if the `biseq_t` was already initialized. I don't mind, but it should be documented, since it's not obvious without reading the code.

Also in `list_to_biseq`, the branch `if ln>mp_bits_per_limb:` leaks `tmp`.

### comment:176 in reply to: ↑ 169 ; follow-up: ↓ 179 Changed 6 years ago by SimonKing

On a different ticket?

It makes no sense to first introduce a new module on this ticket and then make a new ticket to immediately move that new module somewhere else

Except that the new ticket is not just about moving `sage.misc.bounded_integer_sequences` to `sage.data_structures.bounded_integer_sequences`, but moves a whole lot more.

On THIS ticket, it does not make sense to dooe the big move from `sage.misc` to `sage.data_structures`, since this ticket is about introducing bounded integer sequences

### comment:177 in reply to: ↑ 169 Changed 6 years ago by nthiery

The problem with `src/sage/misc` is that it contains a lot of various things. Usually "misc" is used when something doesn't belong to any other category.

Yeah, and most of the time this is stuff that does not really belong to Sage in the first place, and should eventually be moved to upstream libraries. In some sense, it can be seen as a feature that it looks wrong there: it gives an incent for developpers to finally take on the task of actually moving the stuff upstream.

### comment:178 in reply to: ↑ 174 Changed 6 years ago by SimonKing

Also: I think you use `int` in a lot of places where you really want a `long` or `mp_limb_t` or `mp_size_t` or `Py_ssize_t`.

I tried to be consistent. However, I do not find documentation: What is the purpose of `mp_size_t` versus `mp_bitcnt_t`?

Last edited 6 years ago by jdemeyer (previous) (diff)

### comment:179 in reply to: ↑ 176 ; follow-up: ↓ 181 Changed 6 years ago by jdemeyer

On THIS ticket, it does not make sense to do the big move from `sage.misc` to `sage.data_structures`

Of course not. But I meant to add the new module `bounded_integer_sequences` to `sage.data_structures` (the result being that this would be the only module inside `sage.data_structures`). Putting it in the correct place immediately is better than first putting it somewhere else and then moving it.

### comment:180 Changed 6 years ago by SimonKing

Strangely, in gmp.h, I see that `mp_size_t` is `long long int`, hence, it is a signed type, in contrast to Python's `size_t`.

### comment:181 in reply to: ↑ 179 Changed 6 years ago by SimonKing

On THIS ticket, it does not make sense to do the big move from `sage.misc` to `sage.data_structures`

Of course not. But I meant to add the new module `bounded_integer_sequences` to `sage.data_structures` (the result being that this would be the only module inside `sage.data_structures`).

OK. This I could do.

### comment:182 follow-up: ↓ 184 Changed 6 years ago by jdemeyer

I tried to be consistent.

The problem is that `mp_bits_per_limb` is about the number of bits in `mp_limb_t`, not the number of bits in `int`. And using `int` for indices is inconsistent with Python which uses `Py_ssize_t` (which should be the same as `mp_size_t`).

### comment:183 follow-up: ↓ 185 Changed 6 years ago by jdemeyer

I would say: use `mp_size_t` for indices, `mp_limb_t` for the actual integers inside your sequences.

### comment:184 in reply to: ↑ 182 ; follow-up: ↓ 186 Changed 6 years ago by SimonKing

I tried to be consistent.

The problem is that `mp_bits_per_limb` is about the number of bits in `mp_limb_t`, not the number of bits in `int`. And using `int` for indices is inconsistent with Python which uses `Py_ssize_t` (which should be the same as `mp_size_t`).

OK. So, I should use `mp_size_t` for indices. But what should I use `mp_bitcnt_t` for?

### comment:185 in reply to: ↑ 183 ; follow-up: ↓ 187 Changed 6 years ago by SimonKing

I would say: use `mp_size_t` for indices, `mp_limb_t` for the actual integers inside your sequences.

If I am not mistaken, I always use `mp_limb_t` for the limbs. If you find a spot where this is not the case, please tell me.

### comment:186 in reply to: ↑ 184 ; follow-up: ↓ 189 Changed 6 years ago by jdemeyer

But what should I use `mp_bitcnt_t` for?

Any time you need a count of bits (which is always positive!)...

### comment:187 in reply to: ↑ 185 ; follow-up: ↓ 190 Changed 6 years ago by jdemeyer

If you find a spot where this is not the case, please tell me.

```cdef int biseq_getitem(biseq_t S, mp_size_t index) except -1
^^^
```

### comment:188 Changed 6 years ago by jdemeyer

I would say: when in doubt between `mp_size_t` and `mp_bitcnt_t`, use `mp_size_t`.

When in doubt between `int` and anything else: use the anything else.

### comment:189 in reply to: ↑ 186 ; follow-up: ↓ 193 Changed 6 years ago by SimonKing

But what should I use `mp_bitcnt_t` for?

Any time you need a count of bits (which is always positive!)...

You mean, in particular, that I should *not* use it for an index? Hence, if I want to refer to bit number `b` in limb number `l`, then both `b` and `l` should be `mp_size_t`? Currently, I try to consistently use `mp_bitcnt_t` for `b` and `mp_size_t` for `l`.

Last edited 6 years ago by SimonKing (previous) (diff)

### comment:190 in reply to: ↑ 187 ; follow-ups: ↓ 192 ↓ 194 Changed 6 years ago by SimonKing

If you find a spot where this is not the case, please tell me.

```cdef int biseq_getitem(biseq_t S, mp_size_t index) except -1
^^^
```

Rationale: This function does not return a limb, but only one item (and a limb may comprise many items). To keep them apart, I chose a different type. Moreover, I consider `mp_limb_t` as a type that belongs to gmp. But the return type of that function should belong to the "Python world".

### comment:191 Changed 6 years ago by mguaypaq

• Cc mguaypaq removed

### comment:192 in reply to: ↑ 190 Changed 6 years ago by jdemeyer

I consider `mp_limb_t` as a type that belongs to gmp. But the return type of that function should belong to the "Python world".

`int` doesn't belong to the Python world either (don't confuse Cython `int` with Python `int`!)

### comment:193 in reply to: ↑ 189 Changed 6 years ago by jdemeyer

But what should I use `mp_bitcnt_t` for?

Any time you need a count of bits (which is always positive!)...

You mean, in particular, that I should *not* use it for an index? Hence, if I want to refer to bit number `b` in limb number `l`, then both `b` and `l` should be `mp_size_t`? Currently, I try to consistently use `mp_bitcnt_t` for `b` and `mp_size_t` for `l`.

Using `mp_bitcnt_t` for this is fine, when I said "index" I meant index like for Python `__getitem__`

### comment:194 in reply to: ↑ 190 Changed 6 years ago by jdemeyer

To keep them apart, I chose a different type.

If that's the main motivation, define a new type:

```ctypedef mp_limb_t biseq_item_t
```

### comment:195 Changed 6 years ago by jdemeyer

Also, if `bound` is of type `mp_limb_t`, then the following check is not needed at all:

```mpz_init_set_ui(tmp, bound-1)
ln = mpz_sizeinbase(tmp, 2)
if ln>mp_bits_per_limb:
raise ValueError("The integer bound {} does not fit into {}".format(bound, mp_bits_per_limb))
```

### comment:196 follow-up: ↓ 197 Changed 6 years ago by SimonKing

One more reason for using "int" as return type: If I use `mp_limb_t`, then `6` is printed as `6L`.

### comment:197 in reply to: ↑ 196 ; follow-up: ↓ 199 Changed 6 years ago by jdemeyer

One more reason for using "int" as return type: If I use `mp_limb_t`, then `6` is printed as `6L`.

That's not one more reason to use `int`, it's one less reason to use `mp_limb_t`. With `mp_limb_signed_t` (which isn't declared but should be in `src/sage/libs/gmp/types.pxd`), you shouldn't have this problem.

### comment:198 Changed 6 years ago by jdemeyer

But with a better specification of the `biseq` struct (see 171) I might give a better recommendation about which types to use.

### comment:199 in reply to: ↑ 197 Changed 6 years ago by SimonKing

One more reason for using "int" as return type: If I use `mp_limb_t`, then `6` is printed as `6L`.

That's not one more reason to use `int`, it's one less reason to use `mp_limb_t`. With `mp_limb_signed_t` (which isn't declared but should be in `src/sage/libs/gmp/types.pxd`), you shouldn't have this problem.

There may be a problem with either `int` or `mp_limb_signed_t`: If the bound for the integer sequence is very large, then a single item would fill a whole `mp_limb_t`. However, this would not fit into a signed int/limb.

So, at the moment, it seems to me that `ctypedef mp_limb_t biseq_item_t` is preferred. Since the `__repr__` method is not time-critical, I suggest to do the transformation from `mp_limb_t` to a nicely printed type there. I would accept that `__getitem__` returns something that is not so nicely printed.

### comment:200 Changed 6 years ago by git

• Commit changed from 1f5a53e7b23e406c553d789f83604cd4a7f90655 to ef69d1e6eb30a9551bb45d1e91f6180454d78b86

Branch pushed to git repo; I updated commit sha1. New commits:

 ​ef69d1e `Create sage.data_structures, move biseq into it, and amend types in biseq`

### comment:201 Changed 6 years ago by git

• Commit changed from ef69d1e6eb30a9551bb45d1e91f6180454d78b86 to 83be138062f619a237e86650807c44b38eec94e5

Branch pushed to git repo; I updated commit sha1. New commits:

 ​83be138 `Reword documentation of biseq_s`

### comment:202 Changed 6 years ago by SimonKing

• Status changed from needs_work to needs_review

I hope my preceding comments do address your concerns, Jeroen. One more thing: Bitset has `cdef struct bitset_s` and `ctypedef bitset_s bitset_t[1]`. To be consistent with it, I changed `biseq` into `biseq_s`, so that I now have `cdef struct biseq_s` and `ctypedef biseq_s biseq_t[1]`.

### comment:203 Changed 6 years ago by jdemeyer

In the "documentation" of `biseq_s`, this sentence is still way too vague to be useful: `Use bitsets to store the data, which in turn is based on GMP integers`. Like: what is "the data" and how is it stored in the bitset?

And it's not really true that bitset is based on GMP integers. It's just an array of limbs but we use GMP/MPIR `mpn_` functions for some manipulations.

### comment:204 Changed 6 years ago by jdemeyer

• Status changed from needs_review to needs_work

### comment:205 Changed 6 years ago by jdemeyer

Are you using anything from NTL? If not, why the

```include "sage/libs/ntl/decl.pxi"
```

### comment:206 Changed 6 years ago by git

• Commit changed from 83be138062f619a237e86650807c44b38eec94e5 to 1fb819cb133e2b94bfc9222585b8762ee5efad21

Branch pushed to git repo; I updated commit sha1. New commits:

 ​1fb819c `Remove a needless cimport, clarify documentation of biseq_s`

### comment:207 follow-up: ↓ 208 Changed 6 years ago by SimonKing

• Status changed from needs_work to needs_review

The import from ntl has indeed been useless.

Is the description of `biseq_s` clear enough now?

### comment:208 in reply to: ↑ 207 ; follow-up: ↓ 210 Changed 6 years ago by jdemeyer

Is the description of `biseq_s` clear enough now?

Add that the items in the sequence are non-negative integers and that `itembitsize` is in `[1 .. GMP_LIMB_BITS]` (assuming both these statements are true).

And I guess that `mask_item` equals `limb_lower_bits_up(itembitsize)`, where `limb_lower_bits_up` is the function defined in `src/sage/misc/bitset.pxi`.

### comment:209 Changed 6 years ago by git

• Commit changed from 1fb819cb133e2b94bfc9222585b8762ee5efad21 to e166d38d8be72453d42325c5f6a73592ddeb3827

Branch pushed to git repo; I updated commit sha1. New commits:

 ​e166d38 `Further clarification of biseq_s doc.`

### comment:210 in reply to: ↑ 208 Changed 6 years ago by SimonKing

Is the description of `biseq_s` clear enough now?

Add that the items in the sequence are non-negative integers and that `itembitsize` is in `[1 .. GMP_LIMB_BITS]` (assuming both these statements are true).

Done.

And I guess that `mask_item` equals `limb_lower_bits_up(itembitsize)`, where `limb_lower_bits_up` is the function defined in `src/sage/misc/bitset.pxi`.

It could be that you just spotted a bug (corner case). I define it as `((<mp_limb_t>1)<<itembitsize)-1`, which is `limb_lower_bits_down`. And that's to say: If the itembitsize is mp_bits_per_limb (which should be allowed), then the mask is 0, but it should be all bits 1.

Trying to construct a test to see if it really is a bug...

New commits:

 ​e166d38 `Further clarification of biseq_s doc.`

### comment:211 Changed 6 years ago by SimonKing

Yes, it is a bug:

```sage: B = BoundedIntegerSequence(2^32-1, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10])
sage: B
<0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0>
```

Thank you for spotting it!

### comment:212 follow-up: ↓ 216 Changed 6 years ago by jdemeyer

Hang on, I'm working on a reviewer patch.

### comment:213 Changed 6 years ago by jdemeyer

Question: what is the reason that you allow a length of `0`, requiring lots of special cases in the code? I.e. why not replace

```    if S.length:
if R.length:
bitset_realloc(R.data, S.data.size)
else:
bitset_init(R.data, S.data.size)
mpn_copyi(R.data.bits, S.data.bits, S.data.limbs)
elif R.length:
bitset_free(R.data)
```

by

```    bitset_realloc(R.data, S.data.size)
mpn_copyi(R.data.bits, S.data.bits, S.data.limbs)
```

### comment:214 follow-up: ↓ 218 Changed 6 years ago by jdemeyer

I see, bitset doesn't allow a length of 0. Perhaps this (artificial?) limitation on bitsets should be removed first...

### comment:215 Changed 6 years ago by git

• Commit changed from e166d38d8be72453d42325c5f6a73592ddeb3827 to 3d293b96fa5a584684efbaae8fe667b885618797

Branch pushed to git repo; I updated commit sha1. New commits:

 ​3d293b9 `Fixing a corner case`

### comment:216 in reply to: ↑ 212 ; follow-up: ↓ 219 Changed 6 years ago by SimonKing

Hang on, I'm working on a reviewer patch.

### comment:217 follow-up: ↓ 220 Changed 6 years ago by jdemeyer

...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.

### comment:218 in reply to: ↑ 214 Changed 6 years ago by SimonKing

I see, bitset doesn't allow a length of 0.

Exactly. I do need bounded integer sequences of length zero, but then I can not initialise the corresponding bitset (since it is required to have positive length). Therefore the special cases.

### comment:219 in reply to: ↑ 216 Changed 6 years ago by jdemeyer

• Status changed from needs_review to needs_work

No problem, it's just a small fix. Can you leave the branch alone for a while now?

### comment:220 in reply to: ↑ 217 ; follow-up: ↓ 222 Changed 6 years ago by SimonKing

...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.

I am not sure if I want this, since in some cases (at least in my applications) I ask `self.data.size` for the bitlength of the sequence, and thus I would need other special cases there.

### comment:221 Changed 6 years ago by SimonKing

OTOH, I just see that in theses cases (e.g., concatenation of biseq) I already have special cases for length 0. So, it would not be a loss to re-organise the special casing...

### comment:222 in reply to: ↑ 220 Changed 6 years ago by jdemeyer

...or perhaps easier: when given a length of zero, just make the bitset have length 1 as special case.

I am not sure if I want this, since in some cases (at least in my applications) I ask `self.data.size` for the bitlength of the sequence, and thus I would need other special cases there.

I would prefer not to assume that `data.size == length * itembitsize`. For example, it might turn out to be more efficient to store some extra bits (1 extra limb for example).

### comment:223 Changed 6 years ago by jdemeyer

I will rename `list_to_biseq` -> `biseq_init_list` if you don't mind.

### comment:224 Changed 6 years ago by jdemeyer

I would also prefer to remove `biseq_copy` since it's a dangerous function: it assumes that the `biseq` has been allocated but it throws away the current contents.

Better to split this operation in two: changing `biseq_copy` to `biseq_init_copy` assuming it was not allocated. If you really want something like `biseq_copy`, just use `biseq_dealloc` and `biseq_init_copy`.

### comment:225 Changed 6 years ago by jdemeyer

• Dependencies set to #17195

### comment:226 follow-up: ↓ 228 Changed 6 years ago by SimonKing

Why should there be a dependency on cython upgrade?

### comment:227 Changed 6 years ago by jdemeyer

• Dependencies changed from #17195 to #17195, #17196

### comment:228 in reply to: ↑ 226 Changed 6 years ago by jdemeyer

Why should there be a dependency on cython upgrade?

Only the newest Cython has declarations for the `PySlice` API, so you can use `PySlice_GetIndicesEx()` easily.

### comment:229 follow-up: ↓ 230 Changed 6 years ago by SimonKing

• Dependencies changed from #17195, #17196 to #17195

As much as I know, I don't use the bitset functions for which there is a requirement of equality of size. Hence, I don't think #17196 is a dependency.

### comment:230 in reply to: ↑ 229 ; follow-up: ↓ 231 Changed 6 years ago by jdemeyer

As much as I know, I don't use the bitset functions for which there is a requirement of equality of size.

Yes, you don't use them but you should use those functions. It will simplify a lot of things.

### comment:231 in reply to: ↑ 230 ; follow-up: ↓ 232 Changed 6 years ago by jdemeyer

Yes, you don't use them but you should use those functions. It will simplify a lot of things.

For example, `biseq_concat` is broken in the case where `(S1.data.size)%mp_bits_per_limb + (S2.data.size)%mp_bits_per_limb > mp_bits_per_limb`.

### comment:232 in reply to: ↑ 231 Changed 6 years ago by SimonKing

Yes, you don't use them but you should use those functions. It will simplify a lot of things.

For example, `biseq_concat` is broken in the case where `(S1.data.size)%mp_bits_per_limb + (S2.data.size)%mp_bits_per_limb > mp_bits_per_limb`.

Really? OK, I'll try to find an example that exposes the problem.

### comment:233 Changed 6 years ago by SimonKing

```sage: B = BoundedIntegerSequence(2^30-1, [1])
sage: B
<1>
sage: B+B
<1, 1>
```

Where's the problem?

### comment:234 Changed 6 years ago by jdemeyer

Could you try again with a larger value of `1`?

### comment:235 Changed 6 years ago by SimonKing

Here it is:

```sage: B = BoundedIntegerSequence(2^30-1, [2^30-1])
sage: B
<1073741823>
sage: B+B
<1073741823, 3>
```

OK, this requires a fix.

### comment:236 follow-up: ↓ 237 Changed 6 years ago by SimonKing

Do I understand correctly: At #17196, shift and comparison operations should be introduced that behave well wrt bitsizes, and then these operations should replace the mpn_... that I am using here.

### comment:237 in reply to: ↑ 236 Changed 6 years ago by jdemeyer

• Dependencies changed from #17195 to #17195, #17196

Do I understand correctly: At #17196, shift and comparison operations should be introduced that behave well wrt bitsizes, and then these operations should replace the mpn_... that I am using here.

Yes.

### comment:238 Changed 6 years ago by jdemeyer

• Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
• Modified changed from 10/22/14 14:14:46 to 10/22/14 14:14:46

### comment:239 Changed 6 years ago by jdemeyer

• Commit changed from 3d293b96fa5a584684efbaae8fe667b885618797 to 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8

In `biseq_init_list()`, the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be a feature?

New commits:

 ​8c69daf `Upgrade Cython to 0.21.1` ​88e5eca `Merge branch 'ticket/17195' into ticket/15820`

### comment:240 Changed 6 years ago by jdemeyer

Last commit is totally work in progress.

### comment:241 Changed 6 years ago by git

• Commit changed from 88e5ecaa30a54cfadbfe4fc1c0ee6d1840041ee8 to ab2f0c2bf3b68e4e979650f5554d529e5b7ea625

Branch pushed to git repo; I updated commit sha1. New commits:

 ​ab2f0c2 `Various reviewer fixes`

### comment:242 Changed 6 years ago by jdemeyer

New commits:

 ​ab2f0c2 `Various reviewer fixes`

### comment:243 Changed 6 years ago by git

• Commit changed from ab2f0c2bf3b68e4e979650f5554d529e5b7ea625 to 815a40d9a2939ebe6e3e127847c04a8c5d7eaf9a

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​83f8a56 `Relax assumptions for bitset functions` ​3ae75a1 `Move bitset to sage.data_structures` ​7a95511 `Merge branch 'ticket/17196' into HEAD` ​815a40d `Various reviewer fixes`

### comment:244 follow-up: ↓ 245 Changed 6 years ago by SimonKing

What exactly does cython.overflowcheck do? Check whether the given arguments fit into the prescribed data types without truncation?

Last edited 6 years ago by SimonKing (previous) (diff)

### comment:245 in reply to: ↑ 244 ; follow-up: ↓ 249 Changed 6 years ago by jdemeyer

What exactly does cython.overflowcheck do?

See http://docs.cython.org/src/reference/compilation.html: for this application, it checks that the multiplication `totalbitsize = l * itemsize` doesn't overflow.

### comment:246 Changed 6 years ago by git

• Commit changed from 815a40d9a2939ebe6e3e127847c04a8c5d7eaf9a to 8d16a61d2874a38e10233cdf434db3854bd5ee0d

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​8d16a61 `Lots of fixes for bounded integer sequences`

### comment:247 Changed 6 years ago by git

• Commit changed from 8d16a61d2874a38e10233cdf434db3854bd5ee0d to 8d59d141b7090fb6e5f6529b968e224ec0e5f73b

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​8d59d14 `Lots of fixes for bounded integer sequences`

### comment:248 Changed 6 years ago by jdemeyer

• Authors changed from Simon King to Simon King, Jeroen Demeyer

### comment:249 in reply to: ↑ 245 ; follow-up: ↓ 251 Changed 6 years ago by SimonKing

What exactly does cython.overflowcheck do?

See http://docs.cython.org/src/reference/compilation.html: for this application, it checks that the multiplication `totalbitsize = l * itemsize` doesn't overflow.

Aha! So, it does not just check an overflow of the arguments, but it checks overflows happening in the innards of the function. Cool!

### comment:250 Changed 6 years ago by git

• Commit changed from 8d59d141b7090fb6e5f6529b968e224ec0e5f73b to d22c1ca23727402e376c7f772c2f51b8ed6af1f1

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​23762a3 `Import data_structures into global namespace` ​4aafd2f `Merge branch 'ticket/17196' into HEAD` ​d22c1ca `Lots of fixes for bounded integer sequences`

### comment:251 in reply to: ↑ 249 Changed 6 years ago by jdemeyer

So, it does not just check an overflow of the arguments

In fact, it does not check overflow of the arguments.

Overflow checking of the arguments should happen in calling the function, not in the function itself.

Also note that Cython always does overflow checking when converting from Python types to C types, but not by default between C types.

### comment:252 follow-up: ↓ 253 Changed 6 years ago by SimonKing

New commits:

 ​23762a3 `Import data_structures into global namespace` ​4aafd2f `Merge branch 'ticket/17196' into HEAD` ​d22c1ca `Lots of fixes for bounded integer sequences`

New commits:

 ​23762a3 `Import data_structures into global namespace` ​4aafd2f `Merge branch 'ticket/17196' into HEAD` ​d22c1ca `Lots of fixes for bounded integer sequences`

### comment:253 in reply to: ↑ 252 Changed 6 years ago by jdemeyer

The new Cython package isn't yet merged, so you have to manually follow the instructions at #17195 (don't forget the renaming!)

### comment:254 follow-up: ↓ 256 Changed 6 years ago by jdemeyer

My intention for this ticket is to implement an additional function `biseq_setitem()` to set one item and then implement the `biseq` functions using either `biseq_getitem()/biseq_setitem()` or `bitset` functions. In cases where this cannot be done, add the required primitives and use those. The end result will hopefully be a lot less code without loss of efficiency.

### comment:255 follow-up: ↓ 258 Changed 6 years ago by jdemeyer

In case that you missed it, I am copying this question I asked before:

In `biseq_init_list()`, the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?

### comment:256 in reply to: ↑ 254 ; follow-up: ↓ 257 Changed 6 years ago by SimonKing

My intention for this ticket is to implement an additional function `biseq_setitem()` to set one item and then implement the `biseq` functions using either `biseq_getitem()/biseq_setitem()` or `bitset` functions. In cases where this cannot be done, add the required primitives and use those. The end result will hopefully be a lot less code without loss of efficiency.

Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?

### comment:257 in reply to: ↑ 256 ; follow-up: ↓ 260 Changed 6 years ago by jdemeyer

Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?

I don't quite understand what you refer to. Which function do you think would become slower?

### comment:258 in reply to: ↑ 255 ; follow-up: ↓ 259 Changed 6 years ago by SimonKing

In case that you missed it, I am copying this question I asked before:

In `biseq_init_list()`, the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?

What do you mean by "randomly truncated"? The given integers are replaced by their remainder modulo the "mangled" bound (i.e., the smallest power of two that is greater or equal the given bound). It is documented, and it isn't random.

### comment:259 in reply to: ↑ 258 ; follow-up: ↓ 261 Changed 6 years ago by jdemeyer

In case that you missed it, I am copying this question I asked before:

In `biseq_init_list()`, the fact that the integers are randomly truncated seems like a bug to me. Is that supposed to be an intentional feature?

What do you mean by "randomly truncated"? The given integers are replaced by their remainder modulo the "mangled" bound (i.e., the smallest power of two that is greater or equal the given bound). It is documented, and it isn't random.

OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?

### comment:260 in reply to: ↑ 257 ; follow-up: ↓ 270 Changed 6 years ago by SimonKing

Are you sure that, e.g., iteration will not be a lot slower when doing almost the same shift operation repeatedly for all items contained in a single limb?

I don't quite understand what you refer to. Which function do you think would become slower?

`BoundedIntegerSequence.__iter__` and `biseq_to_list` would probably become slower. Currently, one limb is shifted with mpn_rshift, and then all items in this limb are extracted by `>>` in this single limb, before mpn_rshifting the next limb. If I understand correctly, you are planning to replace this by `biseq_getitem`, which means that you mpn_rshift a limb and extract the first item of it, then mpn_rshift the same limb again for the second item.

I suppose repeated use of mpn_rshift is slower than a single mpn_rshift followed by repeated single-limb-`>>`.

### comment:261 in reply to: ↑ 259 ; follow-up: ↓ 271 Changed 6 years ago by SimonKing

OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?

You mean one should just raise an error? OK, that's another possibility.

### comment:262 Changed 6 years ago by SimonKing

Why did you introduce

```assert S1.itembitsize == S2.itembitsize
```

?

I think that generally it is not the job of boilerplate functions to do bound checking. If you want to do bound checking, then do so on the level of `BoundedIntegerSequence`, but not on the level of `biseq_t`.

### comment:263 follow-up: ↓ 268 Changed 6 years ago by SimonKing

Perhaps we should consider to change the hash function for bitsets (see the comments for the hash of bounded integer sequences that you removed). But I suppose this should not be done on this ticket.

My summary:

• We have different functions that assume that the given sequences have equivalent bounds. In most cases, the assumption is not checked, with exception of the one assert that you introduced. I vote for removing the assert.
• I don't think iteration should use `biseq_getitem` to extract items one by one. Instead, the current idea ("shift enough data so that one limb is full and extract items from this single limb") should be preserved.
• I like your change to `biseq_getitem`, where you avoid using `mpn_rshift` for just one/two limbs. I think the same change could be done in `biseq_slice` in the case `step!=1`, and also in iteration.

Other than that, your changes seem fine to me. I suggest to do the "replacement of one/two limb shifts", unless you beat me to it.

### comment:264 Changed 6 years ago by SimonKing

• Reviewers set to Simon King

### comment:265 Changed 6 years ago by SimonKing

Concerning the assert: The only place in which `biseq_init_concat` is used does in fact check compatibility of bounds, and raises a `ValueError` if they are not compatible. Hence, the assert statement is redundant.

### comment:266 Changed 6 years ago by SimonKing

• Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820

### comment:267 Changed 6 years ago by SimonKing

• Commit changed from d22c1ca23727402e376c7f772c2f51b8ed6af1f1 to 41c40cf2c59734e7e665ba9c2350da2930c9cc8e
• Status changed from needs_work to needs_review

From my perspective, it is good to go...

New commits:

 ​41c40cf `Don't use mpn_rshift for shifting two limbs`

### comment:268 in reply to: ↑ 263 Changed 6 years ago by jdemeyer

• We have different functions that assume that the given sequences have equivalent bounds. In most cases, the assumption is not checked, with exception of the one assert that you introduced. I vote for removing the assert.

OK then, no problem.

### comment:269 follow-up: ↓ 275 Changed 6 years ago by jdemeyer

Let's do some benchmarks:

1. With the latest commit:
```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(2, [1]*100000)
sage: timeit('B.list()')
625 loops, best of 3: 965 µs per loop
sage: timeit('list(B)')
125 loops, best of 3: 1.73 ms per loop
sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
sage: timeit('B.list()')
625 loops, best of 3: 1.1 ms per loop
sage: timeit('list(B)')
125 loops, best of 3: 1.91 ms per loop
```
```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(2, [1]*100000)
sage: timeit('B.list()')
125 loops, best of 3: 3 ms per loop
sage: timeit('list(B)')
125 loops, best of 3: 3.87 ms per loop
sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
sage: timeit('B.list()')
125 loops, best of 3: 3.68 ms per loop
sage: timeit('list(B)')
125 loops, best of 3: 4.47 ms per loop
```

### comment:270 in reply to: ↑ 260 Changed 6 years ago by jdemeyer

`BoundedIntegerSequence.__iter__` and `biseq_to_list` would probably become slower. Currently, one limb is shifted with mpn_rshift, and then all items in this limb are extracted by `>>` in this single limb, before mpn_rshifting the next limb. If I understand correctly, you are planning to replace this by `biseq_getitem`, which means that you mpn_rshift a limb and extract the first item of it, then mpn_rshift the same limb again for the second item.

I suppose repeated use of mpn_rshift is slower than a single mpn_rshift followed by repeated single-limb-`>>`.

As you can see, I implemented `biseq_getitem()` much faster than that, so I doubt there is still a slowdown for using repeated calls to `biseq_getitem()`.

### comment:271 in reply to: ↑ 261 Changed 6 years ago by jdemeyer

OK, I didn't literally mean "random", maybe I meant strange and therefore unlikely to be useful. Do you have an application in mind?

You mean one should just raise an error? OK, that's another possibility.

Yes, that is what I mean.

### comment:272 follow-up: ↓ 273 Changed 6 years ago by SimonKing

I just notice that you removed `biseq_to_list`. Why? Is there no speed penalty for using iterator instead?

### comment:273 in reply to: ↑ 272 Changed 6 years ago by jdemeyer

I just notice that you removed `biseq_to_list`.

Just use `[biseq_getitem_py(self.data, i) for i in range(self.data.length)]`

### comment:274 Changed 6 years ago by jdemeyer

Also, I think that `biseq_to_list` is something which belongs in the Python interface, not the C interface.

### comment:275 in reply to: ↑ 269 Changed 6 years ago by SimonKing

I only see this comment after posting my previous comment.

Let's do some benchmarks:

1. With the latest commit:
```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(2, [1]*100000)
sage: timeit('B.list()')
625 loops, best of 3: 965 µs per loop
sage: timeit('list(B)')
125 loops, best of 3: 1.73 ms per loop
sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
sage: timeit('B.list()')
625 loops, best of 3: 1.1 ms per loop
sage: timeit('list(B)')
125 loops, best of 3: 1.91 ms per loop
```
```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(2, [1]*100000)
sage: timeit('B.list()')
125 loops, best of 3: 3 ms per loop
sage: timeit('list(B)')
125 loops, best of 3: 3.87 ms per loop
sage: B = BoundedIntegerSequence(sys.maxint, [1]*100000)
sage: timeit('B.list()')
125 loops, best of 3: 3.68 ms per loop
sage: timeit('list(B)')
125 loops, best of 3: 4.47 ms per loop
```

OK, excellent, then I think it is ok to drop biseq_to_list.

### comment:276 Changed 6 years ago by SimonKing

If I understand correctly, the only remaining issue is whether we silently truncate the input, or raise an error if the given items are too large. I have no preference here, so, you may make it a reviewer patch.

### comment:277 follow-up: ↓ 278 Changed 6 years ago by SimonKing

Sigh. After merging this into #16453, I get several segmentation faults.

### comment:278 in reply to: ↑ 277 Changed 6 years ago by SimonKing

Sigh. After merging this into #16453, I get several segmentation faults.

But fortunately this ticket is not to blame for it `:-)`

### comment:279 follow-up: ↓ 282 Changed 6 years ago by jdemeyer

I am going to leave this ticket alone for at least a few days.

I do think there is still a lot of improvement, mainly simplification:

1. Implement `biseq_setitem()` and use that in a lot of places.
2. Implement `mpn_equal_bits_shifted()` to compare bit patterns efficiently for equality and use this for `biseq_max_overlap` and `biseq_contains`.

### comment:280 Changed 6 years ago by jdemeyer

• Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
• Modified changed from 10/23/14 14:20:11 to 10/23/14 14:20:11

### comment:281 Changed 6 years ago by jdemeyer

• Commit changed from 41c40cf2c59734e7e665ba9c2350da2930c9cc8e to 9056cf636b65d30c5c89bc151c7740c1ea1143d3

New commits:

 ​9056cf6 `Fix handling of bound in biseq_init_list`

### comment:282 in reply to: ↑ 279 Changed 6 years ago by SimonKing

1. Implement `biseq_setitem()` and use that in a lot of places.

`BoundedIntegerSequence` was (originally) thought of as being immutable (it is hashable). Hence, so far, I only see application in `biseq_init_list` and `biseq_slice`. OK, that's something...

1. Implement `mpn_equal_bits_shifted()` to compare bit patterns efficiently for equality and use this for `biseq_max_overlap` and `biseq_contains`.

+1.

### comment:283 follow-up: ↓ 284 Changed 6 years ago by SimonKing

Would it be acceptable that `biseq_setitem(S,index,item)` assumes that `S[item]` is zero before the assignment? This condition holds in the two applications I can think of.

Of course, I could additionally provide a function that sets a previously existing item to zero.

Last edited 6 years ago by SimonKing (previous) (diff)

### comment:284 in reply to: ↑ 283 Changed 6 years ago by jdemeyer

Would it be acceptable that `biseq_setitem(S,index,item)` assumes that `S[item]` is zero before the assignment?

Then I would call it `biseq_inititem(S, index, item)`.

### comment:285 Changed 6 years ago by SimonKing

• Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820

### comment:286 follow-up: ↓ 289 Changed 6 years ago by jdemeyer

• Commit changed from 9056cf636b65d30c5c89bc151c7740c1ea1143d3 to e9c779ae629dbec356bb8546e3974e2a67050051

It will be more efficient to declare `biseq_inititem` as `cdef inline void`.

New commits:

 ​e9c779a `Simplify code by using biseq_getitem/biseq_inititem`

### comment:287 Changed 6 years ago by jdemeyer

Concerning `enumerate()`: I have no idea if Cython optimizes this. If not, you better just manually keep track of `index`.

### comment:288 follow-up: ↓ 290 Changed 6 years ago by jdemeyer

Why did you change `if item_limb > bound` to `if item > bound`? The former is a C comparison, therefore faster.

### comment:289 in reply to: ↑ 286 Changed 6 years ago by jdemeyer

It will be more efficient to declare `biseq_inititem` as `cdef inline void`.

More generally: if you have `cdef` or `cpdef` functions returning nothing and you don't require exception handling, then using `void` as return type is more efficient (without it, the functions return the Python value `None`).

### comment:290 in reply to: ↑ 288 ; follow-up: ↓ 291 Changed 6 years ago by SimonKing

Why did you change `if item_limb > bound` to `if item > bound`? The former is a C comparison, therefore faster.

Sure. But you did `item_limb = item` without testing whether `item > bound`, which means that item_limb could end up below the bound. Hence, you'd have the random truncation that you wanted to avoid.

### comment:291 in reply to: ↑ 290 Changed 6 years ago by jdemeyer

But you did `item_limb = item` without testing whether `item > bound`, which means that item_limb could end up below the bound.

Not true because the assignment `item_limb = item` is a Python -> C conversion which is always checked:

```sage: BoundedIntegerSequence(100, [2^256])
---------------------------------------------------------------------------
OverflowError                             Traceback (most recent call last)
<ipython-input-6-ba7f864a5712> in <module>()
----> 1 BoundedIntegerSequence(Integer(100), [Integer(2)**Integer(256)])

/usr/local/src/sage-config/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.BoundedIntegerSequence.__init__ (build/cythonized/sage/data_structures/bounded_integer_sequences.c:7798)()

/usr/local/src/sage-config/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.biseq_init_list (build/cythonized/sage/data_structures/bounded_integer_sequences.c:5706)()

OverflowError: long int too large to convert
```

### comment:292 Changed 6 years ago by SimonKing

I repeated the timings from commit:157, but still with the `item<=bound` test (so, initialisation time might improve).

```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
```
```sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.45 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
100000 loops, best of 3: 2.1 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
100000 loops, best of 3: 4.28 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
1000 loops, best of 3: 300 µs per loop
```

--> Became much slower

```sage: %timeit x = list(S0)
1000000 loops, best of 3: 1.44 µs per loop
sage: %timeit x = list(S1)
100000 loops, best of 3: 2.18 µs per loop
sage: %timeit x = list(S2)
100000 loops, best of 3: 4.29 µs per loop
sage: %timeit x = list(S3)
1000 loops, best of 3: 286 µs per loop
sage: %timeit x = S0.list()
1000000 loops, best of 3: 697 ns per loop
sage: %timeit x = S1.list()
1000000 loops, best of 3: 1.2 µs per loop
sage: %timeit x = S2.list()
100000 loops, best of 3: 2.78 µs per loop
sage: %timeit x = S3.list()
10000 loops, best of 3: 135 µs per loop
```

--> not much conclusive

```sage: timeit("x=S0[:-1]", number=100000)
100000 loops, best of 3: 815 ns per loop
sage: timeit("x=S1[:-1]", number=100000)
100000 loops, best of 3: 846 ns per loop
sage: timeit("x=S2[:-1]", number=100000)
100000 loops, best of 3: 845 ns per loop
sage: timeit("x=S3[:-1]", number=100000)
100000 loops, best of 3: 1.62 µs per loop
sage: timeit("x=S2[:-1:2]", number=10000)
10000 loops, best of 3: 1.55 µs per loop
sage: timeit("x=S3[:-1:2]", number=10000)
10000 loops, best of 3: 49.8 µs per loop
```

--> Became much faster

```sage: timeit("x=S0[1]", number=1000000)
1000000 loops, best of 3: 354 ns per loop
sage: timeit("x=S0[4]", number=1000000)
1000000 loops, best of 3: 355 ns per loop
sage: timeit("x=S3[1]", number=1000000)
1000000 loops, best of 3: 339 ns per loop
sage: timeit("x=S3[4500]", number=1000000)
1000000 loops, best of 3: 346 ns per loop
```

--> Became perhaps slower.

What does that all mean?

For my applications not very much, actually, because what I care about is not initialisation from lists and direct item access and not slices with step different from one, but I do care about:

• Testing for subsequence containment, comparison and hash
• Slicing with step one
• Concatenation.

So, I leave it up to you whether clarity of code is more important than the above slow-down...

Last edited 6 years ago by SimonKing (previous) (diff)

### comment:293 follow-up: ↓ 295 Changed 6 years ago by jdemeyer

If you do timings, please keep in mind 286.

### comment:294 Changed 6 years ago by SimonKing

Right. When using `item_limb`, the timings for initialisation from lists become good:

```sage: %timeit x = BoundedIntegerSequence(8, L0)
1000000 loops, best of 3: 1.27 µs per loop
sage: %timeit x = BoundedIntegerSequence(16, L1)
1000000 loops, best of 3: 1.51 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L2)
1000000 loops, best of 3: 1.88 µs per loop
sage: %timeit x = BoundedIntegerSequence(32, L3)
10000 loops, best of 3: 65.5 µs per loop
```

### comment:295 in reply to: ↑ 293 Changed 6 years ago by SimonKing

If you do timings, please keep in mind 286.

The timings are after using `inline void`.

### comment:296 Changed 6 years ago by git

• Commit changed from e9c779ae629dbec356bb8546e3974e2a67050051 to 7d258591bab37466eb6392443ed2e84cf699102c

Branch pushed to git repo; I updated commit sha1. New commits:

 ​acf2a75 `Declare some functions as 'cdef inline void'` ​7d25859 `Speed-up for testing bounds`

### comment:297 follow-up: ↓ 303 Changed 6 years ago by jdemeyer

I know it's really a detail, but I intentionally used `repr(item)` to display the error message

```raise ValueError("list item %r larger than %s"%(item, bound) )
```

This way, the error message refers to the list item as given by the user, which might be some strange Python type which prints differently.

### comment:298 Changed 6 years ago by jdemeyer

Something else: the implementation of `biseq_getitem_py` using `PyInt_FromSize_t` gives a good reason to have `biseq_item_t == size_t`. Because of this, I would propose to remove the `biseq_item_t` type again and replace it by an explicit `size_t`.

### comment:299 Changed 6 years ago by git

• Commit changed from 7d258591bab37466eb6392443ed2e84cf699102c to 8a10b734c7f81e44bb401c1fec2dc82bd0e98d5e

Branch pushed to git repo; I updated commit sha1. New commits:

 ​8a10b73 `Use plain size_t, not biseq_item_t`

### comment:300 Changed 6 years ago by git

• Commit changed from 8a10b734c7f81e44bb401c1fec2dc82bd0e98d5e to fa7cde09b93359371f236ce6da13bb8f5a947a08

Branch pushed to git repo; I updated commit sha1. New commits:

 ​fa7cde0 `Use original data in error message`

### comment:301 Changed 6 years ago by git

• Commit changed from fa7cde09b93359371f236ce6da13bb8f5a947a08 to 6e03f19d4bc2379e44e0b8adc32c13ac6e0bcabb

Branch pushed to git repo; I updated commit sha1. New commits:

 ​6e03f19 `Code and speed improvement for biseq containment test`

### comment:302 Changed 6 years ago by SimonKing

I have implemented `mpn_equal_bits_shifted`. I do not use `mpn_rshift` followed be `mpn_cmp`, since this would mean to touch each limb twice (once for shifting, once for comparison), even if the difference occurs early in the sequence. Of course, using such function simplifies the code of `biseq_contains` and `biseq_max_overlap` considerably.

Here I redo the containment tests from comment:157, giving timings with the new code and (in brackets) with the previous code (I would not directly compare with comment:157, since the previous code differs too much from the code back at comment:157).

```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: L0 = [randint(0,7) for i in range(5)]
sage: L1 = [randint(0,15) for i in range(15)]
sage: L2 = [randint(0,31) for i in range(50)]
sage: L3 = [randint(0,31) for i in range(5000)]
sage: T0 = tuple(L0); T1 = tuple(L1); T2 = tuple(L2); T3 = tuple(L3)
sage: S0 = BoundedIntegerSequence(8, L0)
sage: S1 = BoundedIntegerSequence(16, L1)
sage: S2 = BoundedIntegerSequence(32, L2)
sage: S3 = BoundedIntegerSequence(32, L3)
sage: L0x = [randint(0,7) for i in range(5)]
sage: L1x = [randint(0,15) for i in range(15)]
sage: L2x = [randint(0,31) for i in range(50)]
sage: L3x = [randint(0,31) for i in range(5000)]  # verified that they differ from L0,L1,L2,L3
sage: S0x = BoundedIntegerSequence(8, L0x)
sage: S1x = BoundedIntegerSequence(16, L1x)
sage: S2x = BoundedIntegerSequence(32, L2x)
sage: S3x = BoundedIntegerSequence(32, L3x)
sage: S1y = BoundedIntegerSequence(16, L1)
sage: S2y = BoundedIntegerSequence(32, L2)
sage: S3y = BoundedIntegerSequence(32, L3)
sage: S0z1 = S0+S0
sage: S0z2 = S0+S0x
sage: S1z1 = S1+S1
sage: S1z2 = S1+S1x
sage: S2z1 = S2+S2
sage: S2z2 = S2+S2x
sage: S3z1 = S3+S3
sage: S3z2 = S3+S3x
sage: timeit("S0x in S0z2", number=1000000)
1000000 loops, best of 3: 190 ns per loop  (384 ns per loop)
sage: timeit("S1x in S1z2", number=1000000)
1000000 loops, best of 3: 282 ns per loop  (519 ns per loop)
sage: timeit("S2x in S2z2", number=1000000)
1000000 loops, best of 3: 644 ns per loop  (2.1 µs per loop)
sage: timeit("S3x in S3z2", number=1000)
1000 loops, best of 3: 50.2 µs per loop    (2.36 ms per loop)
```

So, there is a very clear improvement, and this time it concerns a function that I use very extensively in my applications.

Jeroen, I think that with the new commit I addressed all of your concerns, hence, for now I will move on to the follow-up tickets (i.e., cythoned quiver paths and the non-commutative F5 algorithm that is not on trac yet).

### comment:303 in reply to: ↑ 297 Changed 6 years ago by jdemeyer

• Status changed from needs_review to needs_work

(never mind)

Last edited 6 years ago by jdemeyer (previous) (diff)

### comment:304 Changed 6 years ago by jdemeyer

• Status changed from needs_work to needs_review

### comment:305 Changed 6 years ago by jdemeyer

`__iter__` and `biseq_index` should be implemented in terms of `biseq_getitem()` (or a good reason given why this is a bad idea).

Last edited 6 years ago by jdemeyer (previous) (diff)

### comment:306 Changed 6 years ago by jdemeyer

• Status changed from needs_review to needs_work

### comment:307 Changed 6 years ago by jdemeyer

In `biseq_init_list`, iteration is about 10% faster the following way:

```for item in data:
item_limb = item
index += 1
```

```for index from 0<=index<R.length:
item_limb = data[index]
```

### comment:308 Changed 6 years ago by jdemeyer

`biseq_clearitem` must be added to the `.pxd` file

### comment:309 follow-up: ↓ 320 Changed 6 years ago by jdemeyer

In `biseq_slice`, the case `step == 1` should also be implemented using bitset primitives.

### comment:310 follow-up: ↓ 311 Changed 6 years ago by jdemeyer

Should I make these changes or will you? I'm just asking to avoid duplicate work.

### comment:311 in reply to: ↑ 310 Changed 6 years ago by SimonKing

Should I make these changes or will you? I'm just asking to avoid duplicate work.

I will try today.

### comment:312 follow-up: ↓ 314 Changed 6 years ago by SimonKing

While I am at it, I will also make `biseq_max_overlap`, `biseq_index` and `biseq_contains` return `mp_size_t`, not `int`. After all, they return an index in a bounded integer sequence (or -1 if not found, or -2 on error), and I think we have agreed that indices should be `mp_size_t`.

### comment:313 follow-up: ↓ 315 Changed 6 years ago by SimonKing

Currently, when calling `B.index(i)`, `i` is compared modulo the `B.bound()` with the items of `B`. Taking `i` modulo the bound is currently done in `biseq_index`. I suggest to move this to `B.index`, since generally I want to avoid bound checking etc. in the boilerplate functions.

### comment:314 in reply to: ↑ 312 Changed 6 years ago by jdemeyer

While I am at it, I will also make `biseq_max_overlap`, `biseq_index` and `biseq_contains` return `mp_size_t`, not `int`. After all, they return an index in a bounded integer sequence (or -1 if not found, or -2 on error), and I think we have agreed that indices should be `mp_size_t`.

I hadn't noticed, but yes: that should certainly be changed.

### comment:315 in reply to: ↑ 313 ; follow-up: ↓ 318 Changed 6 years ago by jdemeyer

Currently, when calling `B.index(i)`, `i` is compared modulo the `B.bound()` with the items of `B`. Taking `i` modulo the bound is currently done in `biseq_index`. I suggest to move this to `B.index`

I would remove this modulo operation completely. If the item is too large or too small, treat it as not found.

### comment:316 follow-up: ↓ 317 Changed 6 years ago by jdemeyer

In `biseq_index`, use `size_t` for `item`.

### comment:317 in reply to: ↑ 316 Changed 6 years ago by SimonKing

In `biseq_index`, use `size_t` for `item`.

Since `biseq_getitem` returns a `size_t`, or is there another reason?

### comment:318 in reply to: ↑ 315 Changed 6 years ago by SimonKing

I would remove this modulo operation completely. If the item is too large or too small, treat it as not found.

OK, but I'll do this test in `BoundedIntegerSequence.index`, not in `biseq_index`.

### comment:319 Changed 6 years ago by SimonKing

Should I use `biseq_getitem_py` both in `BoundedIntegerSequence.__iter__` and `BoundedIntegerSequence.list`? Or should one of them use `biseq_getitem` for efficiency?

### comment:320 in reply to: ↑ 309 Changed 6 years ago by SimonKing

In `biseq_slice`, the case `step == 1` should also be implemented using bitset primitives.

Problem: We would need a bitset primitive that shifts a part of a bitset S. Currently, `bitset_rshift` shifts all of S, not just a part of S.

I see four ways to proceed:

• Write a new primitive `bitset_rshift_subset`, which means code duplication
• Add an optional argument to `bitset_rshift`, which means that all places using this function must be changed.
• Change `bitset_rshift` so that no assumption on the size of source and target are needed: We just shift the minimum of the number of limbs that is available in the source and of the number of limbs that fits into the target.
• Do not use bitset primitives in `biseq_slice`.

### comment:321 follow-up: ↓ 326 Changed 6 years ago by SimonKing

PS: My favourite would be to modify `bitset_rshift` (and `bitset_lshift`) to make it robust against reading/writing out of bound, in the sense that `bitset_rshift(R,S,n)` moves a subset of S that fits into R.

Last edited 6 years ago by SimonKing (previous) (diff)

### comment:322 Changed 6 years ago by git

• Commit changed from 6e03f19d4bc2379e44e0b8adc32c13ac6e0bcabb to b5b066dc218d7fb6e5e13eb19cce4720d4a2d057

Branch pushed to git repo; I updated commit sha1. New commits:

 ​b5b066d `Code simplification for biseq_*, bound check for bitset shifts`

### comment:323 Changed 6 years ago by git

• Commit changed from b5b066dc218d7fb6e5e13eb19cce4720d4a2d057 to 43f78ad50eee7259ab06d5eec1d4cdb03a43f50b

Branch pushed to git repo; I updated commit sha1. New commits:

 ​43f78ad `Adding a reference to trac`

### comment:324 Changed 6 years ago by SimonKing

In the new commits I addressed (I hope) all issues that we discussed yesterday and today.

Timings without the new commits:

```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(27, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10])
sage: %timeit B.list()
1000000 loops, best of 3: 1.64 µs per loop
sage: %timeit list(B)
100000 loops, best of 3: 2.57 µs per loop
sage: %timeit B[2:-2]
1000000 loops, best of 3: 1.13 µs per loop
sage: L = [randint(0,7) for _ in range(1000)]
sage: %timeit B = BoundedIntegerSequence(8, L)
100000 loops, best of 3: 13 µs per loop
sage: B = BoundedIntegerSequence(8, L)
sage: %timeit B.list()
10000 loops, best of 3: 31.5 µs per loop
sage: %timeit list(B)
10000 loops, best of 3: 55 µs per loop
sage: %timeit B[3:800]
1000000 loops, best of 3: 1.26 µs per loop
```

And with the new commit:

```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(27, [8, 8, 26, 18, 18, 8, 22, 4, 17, 22, 22, 7, 12, 4, 1, 7, 21, 7, 10, 10])
sage: %timeit B.list()
1000000 loops, best of 3: 1.6 µs per loop
sage: %timeit list(B)
100000 loops, best of 3: 2.62 µs per loop
sage: %timeit B[2:-2]
1000000 loops, best of 3: 1.12 µs per loop
sage: L = [randint(0,7) for _ in range(1000)]
sage: %timeit B = BoundedIntegerSequence(8, L)
100000 loops, best of 3: 11.5 µs per loop
sage: B = BoundedIntegerSequence(8, L)
sage: %timeit B.list()
10000 loops, best of 3: 30.8 µs per loop
sage: %timeit list(B)
10000 loops, best of 3: 64.9 µs per loop
sage: %timeit B[3:800]
1000000 loops, best of 3: 1.27 µs per loop
```

Apparently, creation from a list became faster, iteration over a long list became 10% slower (but in my appliations, I do not iterate over the items of a bounded integer sequence, hence, I don't care about the slow-down), and everything else did not change significantly.

And the code became clearer, which is a plus. I hope you'll find that my changes to `bitset_rshift` and `bitset_lshift` make sense.

### comment:325 Changed 6 years ago by SimonKing

• Status changed from needs_work to needs_review

### comment:326 in reply to: ↑ 321 Changed 6 years ago by jdemeyer

Given the new code, the comment

```Items will not just be compared by congruence modulo the bound of the
sequence, but by equality.
```

no longer makes sense.

### comment:327 Changed 6 years ago by jdemeyer

I would replace the doc

```If the result of the shift would exceed the size of ``r``, then it
will be cut.
```

by something like

```There are no assumptions on the sizes of ``a`` and ``r``.
Bits which are shifted outside of the resulting bitset are discarded.
```

### comment:328 Changed 6 years ago by jdemeyer

In

```raise ValueError("list item {} larger than {}".format(data[index], bound) )
```

replace `data[index]` by `item`.

### comment:329 Changed 6 years ago by git

• Commit changed from 43f78ad50eee7259ab06d5eec1d4cdb03a43f50b to 7a0dd469527c8a4f39d8d891703fcf5feecda9a1

Branch pushed to git repo; I updated commit sha1. New commits:

 ​7a0dd46 `Some improvements of the doc`

### comment:330 Changed 6 years ago by SimonKing

Done!

New commits:

 ​7a0dd46 `Some improvements of the doc`

### comment:331 Changed 6 years ago by SimonKing

• Status changed from needs_review to needs_work

I found a bug (probably in slicing):

```sage: B1 = BoundedIntegerSequence(8, [0,7])
sage: B2 = BoundedIntegerSequence(8, [2,1,4])
sage: B1[0:1]+B2
<0, 7, 1, 4>
sage: B1[0:1]
<0>
```

My diagnose is that the top bits of the slice (those exceeding the size) have not been cleared. This doesn't show when printing it, but concatenation does rely on the top bits being cleared.

### comment:332 Changed 6 years ago by git

• Commit changed from 7a0dd469527c8a4f39d8d891703fcf5feecda9a1 to 6723c7e44dbae3118e3dff259fd0e633915b53c9

Branch pushed to git repo; I updated commit sha1. New commits:

 ​6723c7e `Fix highest bits of bitsets after fixing`

### comment:333 Changed 6 years ago by SimonKing

• Status changed from needs_work to needs_review

Fixed and tested...

New commits:

 ​6723c7e `Fix highest bits of bitsets after fixing`

### comment:334 Changed 6 years ago by jdemeyer

For consistency, `biseq_slice` should be renamed `biseq_init_slice`.

### comment:335 Changed 6 years ago by jdemeyer

• Reviewers changed from Simon King to Jeroen Demeyer, Simon King

Modulo this comment about `biseq_init_slice`, I think the public biseq interface looks good.

I have not looked at all the changes of the last few days and I don't feel like doing that now. I do plan to review this ticket in some time with a fresh look (of course, other reviewers are always welcome). In any case, I think the current branch works well enough that you can use it for your other tickets.

### comment:336 Changed 6 years ago by git

• Commit changed from 6723c7e44dbae3118e3dff259fd0e633915b53c9 to b6ef6d5c5faffe4059b1539f00e25dd98b9481c0

Branch pushed to git repo; I updated commit sha1. New commits:

 ​b6ef6d5 `Rename biseq_slice -> biseq_init_slice`

### comment:337 Changed 6 years ago by jdemeyer

In `biseq_init_slice(biseq_t R, biseq_t S, mp_size_t start, mp_size_t stop, int step)`, the `step` argument should also be `mp_size_t`.

### comment:338 Changed 6 years ago by git

• Commit changed from b6ef6d5c5faffe4059b1539f00e25dd98b9481c0 to 0e51c4a8ae1720e301b4d57ee51289de475bf51d

Branch pushed to git repo; I updated commit sha1. New commits:

 ​0e51c4a `Change the type of an argument`

### comment:339 Changed 6 years ago by jdemeyer

• Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
• Modified changed from 10/30/14 10:55:56 to 10/30/14 10:55:56

### comment:340 Changed 6 years ago by jdemeyer

• Commit changed from 0e51c4a8ae1720e301b4d57ee51289de475bf51d to d46a22d394c0c3556bd297427614bbc43140efb3

Resolved easy merge conflict.

New commits:

 ​d46a22d `Merge remote-tracking branch 'origin/develop' into ticket/15820`

### comment:341 Changed 6 years ago by SimonKing

Can the review be finished? #16453 and #17435 depend on it.

### comment:342 Changed 6 years ago by SimonKing

PS: The errors that some patchbots report seem unrelated. And I don't see what the coverage script is complaining about. Coverage of the new stuff is 100%.

### comment:343 Changed 6 years ago by git

• Commit changed from d46a22d394c0c3556bd297427614bbc43140efb3 to 7e5f21754e6c056d71609830729c093ba09b56a6

Branch pushed to git repo; I updated commit sha1. New commits:

 ​7e5f217 `Reorganize logic of bitset shifts`

### comment:344 Changed 6 years ago by git

• Commit changed from 7e5f21754e6c056d71609830729c093ba09b56a6 to 0c646186f3522ab9bf3210ddc9bd3baf7b1b4664

Branch pushed to git repo; I updated commit sha1. New commits:

 ​d0a5c78 `Move bitset_fix() function` ​0c64618 `Documentation fixes`

### comment:345 Changed 6 years ago by git

• Commit changed from 0c646186f3522ab9bf3210ddc9bd3baf7b1b4664 to 58f5f57bd3d620b571ce2c8d1e8da372f5171fdf

Branch pushed to git repo; I updated commit sha1. New commits:

 ​93e64fd `Always raise OverflowError if list item is out of bounds`
Last edited 6 years ago by jdemeyer (previous) (diff)

### comment:346 Changed 6 years ago by git

• Commit changed from 58f5f57bd3d620b571ce2c8d1e8da372f5171fdf to 411fe4e04b87e5fdf87e4c3f4052e6747e09eca8

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​411fe4e `Simplify/fix the logic of some biseq functions`

### comment:347 Changed 6 years ago by git

• Commit changed from 411fe4e04b87e5fdf87e4c3f4052e6747e09eca8 to 8db9f6542a54ead67c1df3d069103b620e85eea6

Branch pushed to git repo; I updated commit sha1. New commits:

 ​cf166a6 `Generalise biseq_max_overlap() and rename as biseq_reverse_contains()` ​8db9f65 `Various improvements to BoundedIntegerSequence`

### comment:348 Changed 6 years ago by jdemeyer

My review of this is done now. I'll let you look at my commits.

### comment:349 Changed 6 years ago by jdemeyer

I have made one important functional change: you should change `biseq_max_overlap(A, B)` to `biseq_reverse_contains(B, A, 1)` in the follow-up tickets.

I also fixed a few bugs and changed some exceptions but there should be no other functional changes.

### comment:350 follow-up: ↓ 351 Changed 6 years ago by jdemeyer

• Status changed from needs_review to needs_work
• Work issues set to interrupt handling

We are still missing `sig_*` functions to allow interrupts. But I prefer for you to review my changes before adding those.

### comment:351 in reply to: ↑ 350 ; follow-up: ↓ 358 Changed 6 years ago by SimonKing

Dear Jeroen,

thank you for reviewing!

We are still missing `sig_*` functions to allow interrupts. But I prefer for you to review my changes before adding those.

• You changed the location of the (old) function `bitset_fix`. Why? Won't that increase the likelihood of a merge conflict?
• When reading the function name `biseq_reverse_contains`, I'd expect that it either checks for containment of a reverse subsequence, or it checks for containment of a subsequence starting at the end of the supersequence. Both is not the case. What this function determines is an overlap of the end of first and the beginning of the second argument. That said, I don't think the new function name is worse than the old.

The other changes seem at least reasonable, some are needed. Thanks for spotting it.

In any case, there will be a merge conflict with the next follow-up ticket. But I guess it can hardly be avoided. I think we can now take care of signal handling.

### comment:352 Changed 6 years ago by SimonKing

And one thing is not correct in the description of `biseq_reverse_contains`:

```  Return ``i`` such that the bounded integer sequence ``S1`` starts with
the sequence ``S2[i:]``, where ``start <= i < S2.length``, or return
``-1`` if no such ``i`` exists.
```

If you apply the function (whatever it is named) to `<2,3,2,3,1>` and `<1,2,3,2,3>` (I just notice that you also exchanged the rôles of `S1` and `S2`!), then I want it to yield the maximal overlap: It should return 1, not 3. That's not clear from the new description.

### comment:353 Changed 6 years ago by SimonKing

Hmmmm. I am less and less happy about the "contains" in the name `biseq_reverse_contains`. Neither is the first argument contained in the second nor the second contained in the first. And even if it were: We are not interested in mere containment, but we want to know whether a terminal segment of one argument coincides with an initial segment of the other argument.

### comment:354 follow-ups: ↓ 359 ↓ 364 Changed 6 years ago by SimonKing

What do you think about the name `biseq_start_of_overlap`, and switching back to the old rôle of the arguments (i.e., we search for the smallest possible start index of a terminal segment of S1 that coincides with an initial segment of S2)?

### comment:355 Changed 6 years ago by SimonKing

• Branch changed from u/jdemeyer/ticket/15820 to u/SimonKing/ticket/15820

### comment:356 Changed 6 years ago by SimonKing

• Commit changed from 8db9f6542a54ead67c1df3d069103b620e85eea6 to 63d2693eb4915632e3cb806730a4eb56dc26008e

For the record: With `git trac push 15820`, only the branch of this ticket was changed, but not the sha1. Hence, I have to change it manually.

### comment:357 Changed 6 years ago by SimonKing

By the way: Note that I've formulated error messages such as `ValueError: BoundedIntegerSequence.index(x): x(=32) not in sequence` by taking tuples as good example:

```sage: (1,2,3).index(4)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-e2928ab3b336> in <module>()
----> 1 (Integer(1),Integer(2),Integer(3)).index(Integer(4))

ValueError: tuple.index(x): x not in tuple
```

Your error message is more similar to what is done for lists:

```sage: [1,2,3].index(4)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-e163a1d1b274> in <module>()
----> 1 [Integer(1),Integer(2),Integer(3)].index(Integer(4))

ValueError: 4 is not in list
```

But that's not more than a side-note. I accept your change.

### comment:358 in reply to: ↑ 351 Changed 6 years ago by jdemeyer

• You changed the location of the (old) function `bitset_fix`. Why?

Because I think the new location is better than the old one, it seemed out of place in the list of functions to generate limb patterns.

Won't that increase the likelihood of a merge conflict?

Well, every change increases the likelihood of a merge conflict... But I don't know of any other tickets changing bitset code, so it won't be that bad.

### comment:359 in reply to: ↑ 354 Changed 6 years ago by jdemeyer

What do you think about the name `biseq_start_of_overlap`, and switching back to the old rôle of the arguments (i.e., we search for the smallest possible start index of a terminal segment of S1 that coincides with an initial segment of S2)?

Well, the thing that bothered me most about the old `biseq_max_overlap` function was the implicit start index of 1, which is unexpected (why 1?) and couldn't be changed (unlike `biseq_contains`).

I changed the names and ordering of the arguments to make it more clear that `biseq_contains` and `biseq_reverse_contains` are in fact very similar: the first finds `i` such that `S1[i:]` starts with `S2`, the second finds `i` such that `S1` starts with `S2[i:]`.

I'm certainly open for changes, I just would like the functions to be as intuitive as possible for people not knowing about your application of quiver algebras. For example, the word "overlap" doesn't mean anything to me when you talk just about sequences of integers.

### comment:360 Changed 6 years ago by jdemeyer

• Branch changed from u/SimonKing/ticket/15820 to u/jdemeyer/ticket/15820
• Modified changed from 12/05/14 09:28:15 to 12/05/14 09:28:15

### comment:361 Changed 6 years ago by SimonKing

Apparently the trac pages are not available, because of an internal error. So, I'm trying to comment using the sage-dev script.

You wrote:

I'm certainly open for changes, I just would like the functions to be as intuitive as possible for people not knowing about your application of quiver algebras. For example, the word "overlap" doesn't mean anything to me when you talk just about sequences of integers.

Here I disagree. If I have two lists A and B, then the statement "A overlaps with B" makes immediate sense to me. The only open question is whether "A overlaps with B" means that the overlap of the two lists is supposed to be at the end of A or at the end of B.

### comment:362 Changed 6 years ago by jdemeyer

• Commit changed from 63d2693eb4915632e3cb806730a4eb56dc26008e to 0a03d71d9df26c42cf91dab3df51ca549c434718

New commits:

 ​0a03d71 `More descriptive function name for bounded integer sequences`

### comment:363 Changed 6 years ago by jdemeyer

To reduce unneeded merges, I have cherry-picked your commit on top of my branch.

### comment:364 in reply to: ↑ 354 Changed 6 years ago by jdemeyer

What do you think about switching back to the old rôle of the arguments

I think it's a bad idea.

Look at the following (now removed) code from you:

```if other.startswith(self):
return self
cdef mp_size_t i = biseq_max_overlap(self.data, other.data)
```

Note that the case `i == 0` of `biseq_max_overlap(B, A)` corresponds to `biseq_startswith(A, B)`. An unexpected reordering in my opinion.

In my proposal, the three functions `biseq_startswith`, `biseq_contains` and `biseq_reverse_contains` (whatever the name) all check whether some part of `S1` starts with some part of `S2`. The only thing that changes is which parts of `S1` and `S2` are considered.

### comment:365 Changed 6 years ago by jdemeyer

Better proposal: rename `biseq_reverse_contains` as `biseq_startswith_tail`. I think that's clear: check whether a sequence starts with the tail part of a sequence.

### comment:366 Changed 6 years ago by git

• Commit changed from 0a03d71d9df26c42cf91dab3df51ca549c434718 to 1560ce8dff16ed1fceabcc5656312f718d874211

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

 ​1560ce8 `Rename biseq_reverse_contains as biseq_startswith_tail`

### comment:367 Changed 6 years ago by SimonKing

OK, I am fine with that change.

### comment:368 Changed 6 years ago by jdemeyer

Thanks for the support!

For me, this ticket is positive_review except for the interrupt stuff.

### comment:369 Changed 6 years ago by SimonKing

Since yesterday evening, I tried for about 20 times to push my changes (implementing interrupt stuff). But it doesn't work. Any idea why?

As usual, the error message is not very helpful:

```> git trac push 15820
Pushing to Trac #15820...
Guessed remote branch: u/SimonKing/ticket/15820
Enter passphrase for key '/home/king/.ssh/id_rsa':
Traceback (most recent call last):
File "/home/king/bin/git-trac", line 18, in <module>
cmdline.launch()
File "/home/king/Sage/git/git-trac-command/git_trac/cmdline.py", line 210, in launch
app.push(ticket_number, remote=args.remote, force=args.force)
File "/home/king/Sage/git/git-trac-command/git_trac/app.py", line 194, in push
self.repo.push(remote, force)
File "/home/king/Sage/git/git-trac-command/git_trac/git_repository.py", line 181, in push
self.git.echo.push('trac', refspec)
File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 341, in meth
return self.execute(git_cmd, *args, **kwds)
File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 98, in execute
popen_stderr=subprocess.PIPE)
File "/home/king/Sage/git/git-trac-command/git_trac/git_interface.py", line 263, in _run
raise GitError(result)
git_trac.git_error.GitError
```

Is that a problem on my side, or on trac?

Anyway, the diff would be

• ## src/sage/data_structures/bounded_integer_sequences.pyx

```diff --git a/src/sage/data_structures/bounded_integer_sequences.pyx b/src/sage/data_structures/bounded_integer_sequences.pyx
index 7f5e632..ccda54c 100644```
 a cdef bint biseq_init(biseq_t R, mp_size_t l, mp_bitcnt_t itemsize) except -1: totalbitsize = l * itemsize else: totalbitsize = 1 sig_on() bitset_init(R.data, totalbitsize) sig_off() R.length = l R.itembitsize = itemsize R.mask_item = limb_lower_bits_up(itemsize) cdef bint biseq_init_copy(biseq_t R, biseq_t S) except -1: Initialize ``R`` as a copy of ``S``. """ biseq_init(R, S.length, S.itembitsize) sig_on() bitset_copy(R.data, S.data) sig_off() # # Conversion cdef bint biseq_init_list(biseq_t R, list data, size_t bound) except -1: biseq_init(R, len(data), BIT_COUNT(bound|1)) sig_check() for item in data: item_c = item if item_c > bound: cdef bint biseq_init_concat(biseq_t R, biseq_t S1, biseq_t S2) except -1: The result is written into ``R``, which must not be initialised """ biseq_init(R, S1.length + S2.length, S1.itembitsize) sig_on() bitset_lshift(R.data, S2.data, S1.length * S1.itembitsize) bitset_or(R.data, R.data, S1.data) sig_off() cdef inline bint biseq_startswith(biseq_t S1, biseq_t S2) except -1: cdef inline bint biseq_startswith(biseq_t S1, biseq_t S2) except -1: return False if S2.length == 0: return True sig_check() return mpn_equal_bits(S1.data.bits, S2.data.bits, S2.data.size) cdef bint biseq_init_slice(biseq_t R, biseq_t S, mp_size_t start, mp_size_t stop if step == 1: # Slicing essentially boils down to a shift operation. sig_on() bitset_rshift(R.data, S.data, start*S.itembitsize) sig_off() return 0 # In the general case, we move item by item. cdef mp_size_t biseq_contains(biseq_t S1, biseq_t S2, mp_size_t start) except -2 if S2.length == 0: return start cdef mp_size_t index sig_check() for index from start <= index <= S1.length-S2.length: if mpn_equal_bits_shifted(S2.data.bits, S1.data.bits, S2.length*S2.itembitsize, index*S2.itembitsize): cdef mp_size_t biseq_startswith_tail(biseq_t S1, biseq_t S2, mp_size_t start) ex if S1.length < S2.length - start: start = S2.length - S1.length cdef mp_size_t index sig_check() for index from start <= index < S2.length: if mpn_equal_bits_shifted(S1.data.bits, S2.data.bits, (S2.length - index)*S2.itembitsize, index*S2.itembitsize): cpdef BoundedIntegerSequence NewBISEQ(tuple bitset_data, mp_bitcnt_t itembitsize cdef BoundedIntegerSequence out = BoundedIntegerSequence.__new__(BoundedIntegerSequence) # bitset_unpickle assumes that out.data.data is initialised. biseq_init(out.data, length, itembitsize) sig_on() bitset_unpickle(out.data.data, bitset_data) sig_off() return out def _biseq_stresstest():

### comment:370 Changed 6 years ago by jdemeyer

I can't help with git...

### comment:371 Changed 6 years ago by git

• Commit changed from 1560ce8dff16ed1fceabcc5656312f718d874211 to 4e9e1c51f6524bfa2d2c55d197b82a983dd01184

Branch pushed to git repo; I updated commit sha1. New commits:

 ​4e9e1c5 `Add interrupt handling to bounded integer sequences`

### comment:372 Changed 6 years ago by jdemeyer

• Status changed from needs_work to needs_review
• Work issues interrupt handling deleted

### comment:373 follow-up: ↓ 378 Changed 6 years ago by SimonKing

Questions:

• Is the stresstest really running forever?? It is supposed to take few seconds. I will test in a few minutes.
• When should one use `sig_check()` and when `sig_on()/sig_off()`? I thought it is `sig_check()` when it could happen that an error or a return may happen that is not caused by a keyboard interrupt. But perhaps I am mistaken.

### comment:374 follow-up: ↓ 375 Changed 6 years ago by SimonKing

Aha, I see! You really changed the stress test so that it would run forever, but test that it can be interrupted. Good idea!

### comment:375 in reply to: ↑ 374 ; follow-up: ↓ 377 Changed 6 years ago by SimonKing

Aha, I see! You really changed the stress test so that it would run forever, but test that it can be interrupted. Good idea!

Unfortunately it does NOT test that bounded integer sequence operations are interruptible. I tested a couple of times, and the exception always occurs in the randint function.

### comment:376 Changed 6 years ago by SimonKing

Should we address the interrupt test in a different way? If not, I think it is a positive review from my side. But I'd appreciate if you could answer my question on sig_on/sig_off vs. sig_check.

### comment:377 in reply to: ↑ 375 Changed 6 years ago by jdemeyer

Unfortunately it does NOT test that bounded integer sequence operations are interruptible. I tested a couple of times, and the exception always occurs in the randint function.

I also tested a bit more and the exception indeed happens most often in `randint()` but also sometimes in other places.

### comment:378 in reply to: ↑ 373 Changed 6 years ago by jdemeyer

• When should one use `sig_check()` and when `sig_on()/sig_off()`?

### comment:379 follow-up: ↓ 380 Changed 6 years ago by SimonKing

Should we add tests in the spirit of this:

```sage: from sage.data_structures.bounded_integer_sequences import BoundedIntegerSequence
sage: B = BoundedIntegerSequence(4,[1,2,3,2,3,2,3])
sage: while 1:
....:     C = B+B
....:
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-3-dda173c83bbe> in <module>()
----> 1 while Integer(1):
2     C = B+B
3

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sage_python_check_interrupt (build/cythonized/sage/ext/c_lib.c:1683)()

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sig_raise_exception (build/cythonized/sage/ext/c_lib.c:769)()

KeyboardInterrupt:
sage: while 1:
C = B+B
....:
^C---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-4-dda173c83bbe> in <module>()
1 while Integer(1):
----> 2     C = B+B
3

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/data_structures/bounded_integer_sequences.so in sage.data_structures.bounded_integer_sequences.biseq_init_concat (build/cythonized/sage/data_structures/bounded_integer_sequences.c:7632)()

/home/king/Sage/git/sage/local/lib/python2.7/site-packages/sage/ext/c_lib.so in sage.ext.c_lib.sig_raise_exception (build/cythonized/sage/ext/c_lib.c:769)()

KeyboardInterrupt:
```

As you can see, the interruption occurs in the correct place already in the second try.

### comment:380 in reply to: ↑ 379 ; follow-up: ↓ 381 Changed 6 years ago by jdemeyer

Should we add tests in the spirit of this:

We could, but I'm not sure it is worth the trouble.

### comment:381 in reply to: ↑ 380 Changed 6 years ago by SimonKing

• Status changed from needs_review to positive_review

Should we add tests in the spirit of this:

We could, but I'm not sure it is worth the trouble.

Then I guess I can set this to positive review, right?

### comment:382 Changed 6 years ago by vbraun

• Branch changed from u/jdemeyer/ticket/15820 to 4e9e1c51f6524bfa2d2c55d197b82a983dd01184
• Resolution set to fixed
• Status changed from positive_review to closed

### comment:383 follow-up: ↓ 384 Changed 6 years ago by tmonteil

• Commit 4e9e1c51f6524bfa2d2c55d197b82a983dd01184 deleted

I just saw this ticket on the trac timeline, i wonder whether two similar objects are being developed in parallel by disjoint sets of people, see #17013.

### comment:384 in reply to: ↑ 383 Changed 6 years ago by SimonKing

Indeed there is an obvious overlap. If I understand correctly, #17013 uses `char *` as underlying data structure, which means they are restricted to an alphabet of length at most 255. I have experimented with a similar implementation, too, but found that an implementation based on gmp gives better speed than what I was able to obtain with `char*`.