Opened 18 months ago
Last modified 6 months ago
#25633 needs_review enhancement
Speed up SBox module
Reported by:  asante  Owned by:  asante 

Priority:  major  Milestone:  sage8.5 
Component:  cryptography  Keywords:  cython, sbox, days94 
Cc:  tscrim  Merged in:  
Authors:  Friedrich Wiemer  Reviewers:  Travis Scrimshaw 
Report Upstream:  N/A  Work issues:  
Branch:  public/crypto/speedup_sbox25633 (Commits)  Commit:  5126cb813618cc8e17790bc4a77034b382c2aca2 
Dependencies:  #25516  Stopgaps: 
Description (last modified by )
As briefly discussed on ask.sagemath.org (https://ask.sagemath.org/question/42492/hugedelayinsagecryptosboxsboxmethodnonlinearityintroducedinv82/?answer=42505#postid42505), the implementation for computing the linear approximation table can be speed up. Similarly, the difference distribution matrix computation should be faster. Maybe the module should just be converted to cython.
Change History (52)
comment:1 Changed 18 months ago by
 Owner changed from (none) to asante
comment:2 Changed 18 months ago by
 Dependencies set to 25516
comment:3 Changed 18 months ago by
 Dependencies changed from 25516 to #25516
comment:4 Changed 18 months ago by
 Branch set to u/asante/speed_up_lat_computation_for_sboxes
comment:5 Changed 18 months ago by
 Commit set to b8f743b10d42477e4fa21274089af8d4786e3a61
Branch pushed to git repo; I updated commit sha1. New commits:
b8f743b  Reset branch to `develop` to fix git del && git add vs git mv

comment:6 Changed 18 months ago by
I've reset the branch, because the initial "git del && git add" poluted the diff (instead of doing a "git mv").
comment:7 Changed 18 months ago by
 Commit changed from b8f743b10d42477e4fa21274089af8d4786e3a61 to c3c114869d2a06815f6cef0290ed387b0b5f0adf
comment:8 Changed 18 months ago by
 Branch changed from u/asante/speed_up_lat_computation_for_sboxes to u/asante/speed_up_sbox_module
 Commit c3c114869d2a06815f6cef0290ed387b0b5f0adf deleted
comment:9 Changed 18 months ago by
 Commit set to a01c9d889ffd944e6cae5384efbd0d4db9cb5f6b
Branch pushed to git repo; I updated commit sha1. New commits:
a01c9d8  cythonize sbox.py

comment:10 Changed 18 months ago by
So, finally managed to delete the branch from my first attempt ~.~
comment:11 Changed 18 months ago by
 Description modified (diff)
 Summary changed from Speed up LAT computation for SBoxes to Speed up SBox module
comment:12 Changed 18 months ago by
 Commit changed from a01c9d889ffd944e6cae5384efbd0d4db9cb5f6b to 92dd237fb76ceca8b9c843760d27a2335019fa29
Branch pushed to git repo; I updated commit sha1. New commits:
92dd237  skip input conversion/checks in computation of component_function when used within linear_approximation_matrix

comment:13 Changed 18 months ago by
 Commit changed from 92dd237fb76ceca8b9c843760d27a2335019fa29 to ada90daf6b30beaf0e6aabab41523ba213bcceb1
Branch pushed to git repo; I updated commit sha1. New commits:
ada90da  cdef'ed SBox class

comment:14 Changed 18 months ago by
 Commit changed from ada90daf6b30beaf0e6aabab41523ba213bcceb1 to 9bd10441bb6b1bf6a24b7b0c07231986596c6f44
comment:15 Changed 18 months ago by
 Commit changed from 9bd10441bb6b1bf6a24b7b0c07231986596c6f44 to 16d7211a61e7251572db1a699de6c05e5affac6c
Branch pushed to git repo; I updated commit sha1. New commits:
16d7211  cythonized class methods of sbox

comment:16 Changed 18 months ago by
 Keywords cython sbox added
 Status changed from new to needs_review
I think the methods are now reasonable optimized. Computing the linear approximation matrix now takes roughly 40 to 80ms on my laptop  note that this heavily depends on the scaling of the matrix in the end:
sage: for j in range(10): ....: S = [x for x in range(256)];shuffle(S) ....: S = sage.crypto.sbox.SBox(S) ....: %time _ = S.linear_approximation_matrix() ....: CPU times: user 180 ms, sys: 24 ms, total: 204 ms Wall time: 191 ms CPU times: user 84 ms, sys: 0 ns, total: 84 ms Wall time: 84.6 ms CPU times: user 84 ms, sys: 0 ns, total: 84 ms Wall time: 85.2 ms CPU times: user 84 ms, sys: 0 ns, total: 84 ms Wall time: 84.5 ms CPU times: user 88 ms, sys: 0 ns, total: 88 ms Wall time: 84.9 ms CPU times: user 84 ms, sys: 0 ns, total: 84 ms Wall time: 84.5 ms CPU times: user 84 ms, sys: 0 ns, total: 84 ms Wall time: 83.6 ms CPU times: user 84 ms, sys: 0 ns, total: 84 ms Wall time: 84.1 ms CPU times: user 88 ms, sys: 0 ns, total: 88 ms Wall time: 85.9 ms CPU times: user 84 ms, sys: 0 ns, total: 84 ms Wall time: 85.3 ms sage: for j in range(10): ....: S = [x for x in range(256)];shuffle(S) ....: S = sage.crypto.sbox.SBox(S) ....: %time _ = S.linear_approximation_matrix("fourier_coefficient") ....: CPU times: user 48 ms, sys: 0 ns, total: 48 ms Wall time: 49.3 ms CPU times: user 44 ms, sys: 0 ns, total: 44 ms Wall time: 42.6 ms CPU times: user 48 ms, sys: 0 ns, total: 48 ms Wall time: 45.7 ms CPU times: user 40 ms, sys: 0 ns, total: 40 ms Wall time: 40.6 ms CPU times: user 44 ms, sys: 0 ns, total: 44 ms Wall time: 44.5 ms CPU times: user 40 ms, sys: 0 ns, total: 40 ms Wall time: 40.5 ms CPU times: user 44 ms, sys: 0 ns, total: 44 ms Wall time: 44.3 ms CPU times: user 44 ms, sys: 0 ns, total: 44 ms Wall time: 41.7 ms CPU times: user 44 ms, sys: 0 ns, total: 44 ms Wall time: 44.5 ms CPU times: user 40 ms, sys: 0 ns, total: 40 ms Wall time: 40.9 ms
comment:17 Changed 18 months ago by
 Commit changed from 16d7211a61e7251572db1a699de6c05e5affac6c to 95c38c6d8ce6f6fa3d6b6b7ff3e1f1ed2d7db9a3
Branch pushed to git repo; I updated commit sha1. New commits:
95c38c6  cdef'd more loop variables, changes based on Jeroen's comments

comment:18 Changed 17 months ago by
 Commit changed from 95c38c6d8ce6f6fa3d6b6b7ff3e1f1ed2d7db9a3 to a476adf53f4e7e122365017babf0b2e249ad5e9e
Branch pushed to git repo; I updated commit sha1. New commits:
a476adf  Merge branch 'develop' into t/25633/speed_up_sbox_module

comment:19 Changed 17 months ago by
 Commit changed from a476adf53f4e7e122365017babf0b2e249ad5e9e to 2299dcbffb52a45df70ee337199e231e6325c02d
Branch pushed to git repo; I updated commit sha1. New commits:
2299dcb  pyflakes changes

comment:20 Changed 17 months ago by
 Commit changed from 2299dcbffb52a45df70ee337199e231e6325c02d to eb7866cec0976675a187d9190e82b0cbb4b2c672
Branch pushed to git repo; I updated commit sha1. New commits:
eb7866c  fix doctests

comment:21 Changed 16 months ago by
 Milestone changed from sage8.3 to sage8.4
comment:22 Changed 16 months ago by
 Commit changed from eb7866cec0976675a187d9190e82b0cbb4b2c672 to 6f3cd548565ca1bc82aab9cf0944bd11341301d5
Branch pushed to git repo; I updated commit sha1. New commits:
6f3cd54  Merge branch 'develop' into t/25633/speed_up_sbox_module

comment:23 Changed 13 months ago by
 Commit changed from 6f3cd548565ca1bc82aab9cf0944bd11341301d5 to fca8bbe0ec01030328b66436234af7c7717c8106
Branch pushed to git repo; I updated commit sha1. New commits:
fca8bbe  Merge branch 'develop' into t/25633/speed_up_sbox_module

comment:24 Changed 13 months ago by
 Milestone changed from sage8.4 to sage8.5
comment:25 Changed 13 months ago by
 Cc tscrim added
 Status changed from needs_review to needs_work
Merge conflict. Once rebased on the latest beta, I will review this.
comment:26 Changed 13 months ago by
 Commit changed from fca8bbe0ec01030328b66436234af7c7717c8106 to 3ab449cf68a445cb57fc0c0221466e957458fd99
Branch pushed to git repo; I updated commit sha1. New commits:
3ab449c  Merge branch 'develop' into t/25633/speed_up_sbox_module

comment:27 Changed 13 months ago by
In #25708 some SBox class methods were deprecated. These now throw an error, when called:
sage: from sage.crypto.sbox import SBox sage: S = SBox(7,6,0,4,2,5,1,3) sage: S.difference_distribution_matrix() /home/asante/werkstatt/werkbank/sage/src/bin/sageipython:1: DeprecationWarning: difference_distribution_matrix is deprecated. Please use difference_distribution_table instead. See https://trac.sagemath.org/25708 for details. #!/usr/bin/env sagepython23  TypeError Traceback (most recent call last) <ipythoninput39af42cf79b34> in <module>() > 1 S.difference_distribution_matrix() /home/asante/werkstatt/werkbank/sage/local/lib/python2.7/sitepackages/sage/misc/superseded.pyc in __call__(self, *args, **kwds) 426 return self.func(*args, **kwds) 427 else: > 428 return self.func(self.instance, *args, **kwds) 429 430 def __get__(self, inst, cls=None): TypeError: __call__() takes exactly 0 positional arguments (1 given)
Should I fix this within this ticket or create a new one? And does anyone has an idea whats wrong with this deprecation? The new methods work as expected.
New commits:
3ab449c  Merge branch 'develop' into t/25633/speed_up_sbox_module

comment:28 Changed 13 months ago by
We should fix this within this ticket by not making it an alias but manually issuing the deprecation warning and calling the corresponding function. So something like this:
def difference_distribution_matrix(self): """ Deprecated in :trac:`25708` for :meth:`difference_distribution_table`. """ from sage.misc.superseded import deprecation deprecation(25708, "difference_distribution_matrix is deprecated. Please use difference_distribution_table instead.") return self.difference_distribution_table()
Since this is slated to be removed, it is not worth the time to add doctests.
comment:29 Changed 13 months ago by
Some comments:
 Instead of using
long
andint
, you should use the more machineindependentPy_ssize_t
(unless you have a specific reason to limit the size, such as you know these must be small values).
 In
_nterms
, it would be good to marktotal
,divisor
, andvar_choices
asPy_ssize_t
. Do you also wantvar_choices/divisor
to be floor/integer division? If so, I think it is better to dovar_choices//divisor
to better emphasize this.
 It might be better to not declare
i
to be abint
infrom_bits
as you never use that fact (so it avoids extra checks in Cython). Also, why is there even aswp
function? Would it be better to just doif self._big_endian: x = list(reversed(x)) return ZZ([i for i in self._rpad(swp(x), n)], 2)
 Replace
[GF(2)(0)]
with[GF(2).zero()]
.  Why are you converting to
ZZ
hereX = ZZ([ZZ(i) for i in X], 2)
wheni
is already anint
? It might also be good to take care of this comment inInteger
:# should probably also have fast code for Python ints...
It actually seems like a good thing to factor the code to construct an integer from a bitstring out into acdef
helper function.  It seems redundant to call
vector
herereturn K(vector(GF(2), out))
. Shouldn'tK
be able to handleout
?  You could consider making
output_bits
into an array that you manuallyalloc
/free
so you will have better type control and can initialize without having to create a temp object to reverse it.  You could use the
get_unsafe
operations forlat
andact
(once you explicitly cast it as aMatrix
object).  This is not correct:
cdef A = []
. I believe you wantcdef list A = []
.  Add the
cdef list
declaration:cdef list L = [self(i) for i in range(1 << m)]
.  I would factor out the
substitute
into acdef
function and explicitly declare variable types.
comment:30 Changed 13 months ago by
 Commit changed from 3ab449cf68a445cb57fc0c0221466e957458fd99 to 3d9232efc735c3933a500e91c574c5c83482f0bc
comment:31 followup: ↓ 32 Changed 13 months ago by
regarding the cdef Py_ssize_t
, should I also change parts like the following?
cdef int i x = [GF(2)(i) for i in ZZ(x).digits(base=2, padto=n)]
Another thing that comes to my mind, when I see this part: I'm not so sure how helpful it actually is, to convert the digits to GF(2) elements, as we are only returning a list and not a vector.
But I guess changing this could cause some other code to break, so it may not be good to do so?
comment:32 in reply to: ↑ 31 Changed 13 months ago by
Replying to asante:
regarding the
cdef Py_ssize_t
, should I also change parts like the following?cdef int i x = [GF(2)(i) for i in ZZ(x).digits(base=2, padto=n)]
I don't see the benefit of saying/converting i
to an int
because the GF(2)
element constructor does not use that information. So it essentially adds another useless check. Thus, I would remove the cdef int i
altogether.
Another thing that comes to my mind, when I see this part: I'm not so sure how helpful it actually is, to convert the digits to GF(2) elements, as we are only returning a list and not a vector.
A list is lightweight compared to a vector and implemented directly into the Python interpreter/language, so it is much faster.
comment:33 followup: ↓ 34 Changed 13 months ago by
OK. I added this cdef int i
because, IIRC, Jeroen said that it is faster, because cython can then optimize the for i in
loop stuff. But maybe I just misunderstood this and its only faster when doing something like for i in range(...)
?
But OK, I'll just remove it then.
Regarding the second point: should I remove the conversion to GF(2) then, or leave it?
comment:34 in reply to: ↑ 33 Changed 13 months ago by
Replying to asante:
OK. I added this
cdef int i
because, IIRC, Jeroen said that it is faster, because cython can then optimize thefor i in
loop stuff. But maybe I just misunderstood this and its only faster when doing something likefor i in range(...)
? But OK, I'll just remove it then.
Yes, that is when it is faster because Cython converts it to a C for
loop. It is also useful for when you are using the type declaration.
Regarding the second point: should I remove the conversion to GF(2) then, or leave it?
IDK as this is outside of my area of math. If it is the object you return and mathematically it should be in GF(2)
, then you should keep the conversion. I was merely commenting on your question above returning a list versus a vector.
Microoptimization:
F = GF(2) x = [F(i) for i in ZZ(x).digits(base=2, padto=n)]
so you do not (re)create GF(2)
on every call.
comment:35 Changed 13 months ago by
 Commit changed from 3d9232efc735c3933a500e91c574c5c83482f0bc to c7cc765f8ea5bdb6e088cbddccacc47e4ec72268
Branch pushed to git repo; I updated commit sha1. New commits:
5ace44d  removed swp function from to/from_bits method

b46866d  changed GF(2)(0) to GF(2).zero()

5feb31d  changing a leftover in a comment from the SBox.*_matrix deprecation

b1bcc95  to/from bits clean up

533731e  cdef'ed integers, using Py_ssize_t here

2d5125c  cdef'ed lists and slightly refactored SBox.*_table methods

24a8bbe  clean up the SBox construction functions, as they duplicate some code

89cf15f  restructured SBox.__call__

c7cc765  simplify SBox.interpolation_polyonmial by utilizing SBox.__call__

comment:36 followup: ↓ 37 Changed 13 months ago by
I tried to commit the changes in logical parts, hopefully thats ok for reviewing.
Regarding commit 24a8bbe 'clean up the SBox construction functions, as they duplicate some code'  I'm not sure how to properly test the cdef functions used here. Are these indirect doctests ok or should I think about some better tests?
comment:37 in reply to: ↑ 36 Changed 13 months ago by
Replying to asante:
I tried to commit the changes in logical parts, hopefully thats ok for reviewing.
Yes, that is good (both for reviewing and history, makes it easier to find when bugs might have been introduced; I am not as good as I should be with granularity of my commits...).
Regarding commit 24a8bbe 'clean up the SBox construction functions, as they duplicate some code'  I'm not sure how to properly test the cdef functions used here. Are these indirect doctests ok or should I think about some better tests?
Indirect doctests are good. Strictly speaking, I think our policy is cdef
functions/methods do not need doctests, but it is better to have them when can be meaningful.
comment:38 Changed 13 months ago by
As a data point for why list
is faster:
sage: L = list(map(GF(2), [1,0,1,0,1])) sage: %timeit k(vector(L)) The slowest run took 896.22 times longer than the fastest. This could mean that an intermediate result is being cached. 10000 loops, best of 3: 81.6 µs per loop sage: %timeit k(L) 10000 loops, best of 3: 22.5 µs per loop
I am planning to go through the files here this morning and see what other changes I can do to try and speed things up further. Do you have some computations you have been doing for timings of various parts (besides comment:16)? (I do not know anything of the mathematics, so I don't know what are meaningful examples.)
comment:39 Changed 13 months ago by
No I have no other timing checks. And its also quite late here, so I wont have much time for some experiments in the next few hours..
From my intuition it might be worth to look for optimisations in the __call__
method. I assume it is the most called method and the current input handling looks like it takes some time.. But I also do not know how to handle this more elegant.
It's basically only a array look up, but for sake of easier handling, we also want to support vector/list/tuple inputs (that encodes the array index as a binary string) or a finite field element (where the coefficient vector encodes the index as binary string, or, equivalent, the return value is the evaluation of the polynomial representation of the sbox).
Currently this is done with this try blocks, but I guess catching exceptions just to handle different input types might not be the fastest option.
comment:40 Changed 13 months ago by
Is this a bug? According to the docstring in __call__
, it is:
sage: from sage.crypto.sbox import SBox sage: S = SBox([7,6,0,4,2,5,1,3]) sage: k.<a> = GF(2^3) sage: vector(a^2) (0, 0, 1) sage: S([0,0,1]) [1, 1, 0] sage: S(vector(a^2)) [1, 1, 0] sage: S(a^2) # From the docstring, this should be 1 + a a
comment:41 Changed 12 months ago by
Thanks for checking this. I stumbled across this during changing to cython. In the end I was a bit confused and then forgot about this again. But I think you are right, it should be 1 + a.
When fixing this thou, other doctests that rely on the current behaviour fail:
********************************************************************** File "src/sage/crypto/sbox.pyx", line 1007, in sage.crypto.sbox.SBox.interpolation_polynomial Failed example: f Expected: x^6 + a*x^5 + (a + 1)*x^4 + (a^2 + a + 1)*x^3 + (a^2 + 1)*x^2 + (a + 1)*x + a^2 + a + 1 Got: (a^2 + a + 1)*x^6 + a^2*x^5 + (a + 1)*x^4 + (a^2 + a)*x^3 + x^2 + a*x + a^2 + a + 1 ********************************************************************** File "src/sage/crypto/sbox.pyx", line 1759, in sage.crypto.sbox.SBox.is_monomial_function Failed example: S.interpolation_polynomial() Expected: (a + 1)*x^6 + (a^2 + a + 1)*x^5 + (a^2 + 1)*x^3 Got: (a + 1)*x^6 + (a^2 + a + 1)*x^5 + (a^2 + a)*x^4 + (a^2 + 1)*x^3 + a*x^2 + a*x ********************************************************************** File "src/sage/crypto/sbox.pyx", line 1763, in sage.crypto.sbox.SBox.is_monomial_function Failed example: S.is_monomial_function() Expected: True Got: False ********************************************************************** File "src/sage/crypto/sbox.pyx", line 1765, in sage.crypto.sbox.SBox.is_monomial_function Failed example: S.interpolation_polynomial() Expected: x^6 Got: (a^2 + a)*x^6 + (a + 1)*x^5 + (a + 1)*x^4 + (a^2 + a + 1)*x^3 + (a^2 + a)*x **********************************************************************
Should I change this doctests? This might also brake other code, could'nt it?
comment:42 Changed 12 months ago by
I knew these were failing, but I did not want to change them until I knew what the correct behavior was (specifically, if I switched the endianness, then the tests passed). I will change them as I continue my review. It is possible that it will break code in the wild, but I would treat that as a spacebar.
comment:43 Changed 12 months ago by
With my latest changes, I am getting a speedup from your last comment of 72ms
and 36ms
to 60ms
and <22ms
respectively.
Some other timings of current: without my last commit:
sage: from sage.crypto.sbox import SBox sage: S = SBox([7,6,0,4,2,5,1,3]) sage: k.<a> = GF(2^3) sage: x = vector(a) sage: %timeit S([0,1,0]) 100000 loops, best of 3: 10.5 µs per loop sage: %timeit S(a) 10000 loops, best of 3: 50.6 µs per loop sage: %timeit S(x) 10000 loops, best of 3: 32.4 µs per loop sage: %timeit S(5) 10000000 loops, best of 3: 138 ns per loop
vs your branch:
sage: %timeit S([0,1,0]) 100000 loops, best of 3: 14.5 µs per loop sage: %timeit S(a) 10000 loops, best of 3: 65.3 µs per loop sage: %timeit S(x) 10000 loops, best of 3: 33.9 µs per loop sage: %timeit S(5) 1000000 loops, best of 3: 434 ns per loop
comment:44 Changed 12 months ago by
With my last commit, I now get 50ms
and 13ms
respectively. Little tweaks and looking at the C code can go a long way. :)
comment:45 Changed 12 months ago by
 Branch changed from u/asante/speed_up_sbox_module to public/crypto/speedup_sbox25633
 Commit changed from c7cc765f8ea5bdb6e088cbddccacc47e4ec72268 to a8ec337bb336320c9c01737aa0b4a7366530a401
 Reviewers set to Travis Scrimshaw
 Status changed from needs_work to needs_review
Okay, I have done what I can other than special casing the bit in/output for ZZ
. Yet, I think this is good for now.
However, I would like verification from someone else who does use the F_{2} call input to the SBox to verify that the docstring was indeed how it was suppose to be treated. Subsequently, I would like verification about the corresponding doctests that needed to be changed.
New commits:
fdfbbd5  PEP8 and some standardization of error messages and docstrings.

b2b4724  Fixing deprecated function.

c291c1f  Some optimizations avoiding function calls and improving __call__.

6703a2f  Fixing doctests and some other speedups.

0f2b8ed  A few more speedups and improvements.

cd376ec  Some PEP8 and cleanup of boolean_function.pyx.

a8ec337  Some little fixes to improve the C code.

comment:46 Changed 12 months ago by
 Commit changed from a8ec337bb336320c9c01737aa0b4a7366530a401 to 65352b790482abe5d04537ad714ad42eae9d881d
comment:47 Changed 12 months ago by
I took care of two other todo points I mentioned above. Now I am done.
comment:48 followup: ↓ 49 Changed 12 months ago by
Thanks for your work :)
Is the change in rings/integer.pyx on purpose within this ticket?
Apart from that, LGTM. I'll see if I can find someone who used the interpolation stuff (maybe Martin Albrecht did).
comment:49 in reply to: ↑ 48 Changed 12 months ago by
Replying to asante:
Thanks for your work :)
No problem.
Is the change in rings/integer.pyx on purpose within this ticket?
Yes, it is.
Apart from that, LGTM. I'll see if I can find someone who used the interpolation stuff (maybe Martin Albrecht did).
Great, thank you. Looks like we will need to rebase the branch on the latest version before finalizing this, but that should be trivial.
comment:50 Changed 12 months ago by
 Commit changed from 65352b790482abe5d04537ad714ad42eae9d881d to 5126cb813618cc8e17790bc4a77034b382c2aca2
Branch pushed to git repo; I updated commit sha1. New commits:
5126cb8  Merge branch 'public/crypto/speedup_sbox25633' of git://trac.sagemath.org/sage into public/crypto/speedup_sbox25633

comment:51 Changed 12 months ago by
I just did the trivial rebase.
comment:52 Changed 6 months ago by
So in order to get this ticket closed at some point and as https://groups.google.com/forum/#!searchin/sagedevel/sbox$20interpolation_polynomial%7Csort:date/sagedevel/_fLVLr4B7Q/KnxL9WnyBQAJ did not really resulted in a final decision, should we go with Martin's suggestion and keep the current behaviour, and discuss it in the docs?
This is a follow up ticket of #25516.