Opened 6 years ago
Closed 6 years ago
#19422 closed enhancement (fixed)
A new structure for Punctured Codes
Reported by:  dlucas  Owned by:  

Priority:  major  Milestone:  sage7.3 
Component:  coding theory  Keywords:  
Cc:  wdj, ppurka  Merged in:  
Authors:  David Lucas  Reviewers:  Julien Lavauzelle 
Report Upstream:  N/A  Work issues:  
Branch:  45d4ee1 (Commits, GitHub, GitLab)  Commit:  45d4ee111d235690820ef35269fb8991aa6f4abb 
Dependencies:  #19653  Stopgaps: 
Description (last modified by )
This ticket proposes a new implementation for punctured codes.
It contains:
 a new code class,
PuncturedCode
 a dedicated encoder to compute the generator matrix of a Punctured Code.
 a dedicate decoder which uses the original code to correct errors
This new structure presents the following advantages:
 it keeps track of the code it comes from. If one punctures a code A, the punctured code one gets will remember it comes from A.
 This allows to perform several operations, like encoding or picking a random element without constructing the generator matrix for the punctured code, thus saving time and memory.
 because it keeps track of the original code, it's possible to get a structured representation of the punctured code, e.g. if one punctures a GRS code and asks for the structured representation of the punctured code one got, a GRS code will be returned.
Change History (44)
comment:1 Changed 6 years ago by
 Branch set to u/dlucas/punctured_codes
comment:2 Changed 6 years ago by
 Commit set to 37683a13c904825582e9a4457a44ba0b70168a33
 Status changed from new to needs_review
comment:3 Changed 6 years ago by
AbstractLinearCode
has a method punctured
. Shouldn't this return a PuncturedCode
object after this patch?
comment:4 Changed 6 years ago by
I chose to keep this method as it directly returns a LinearCode
object.
So if one wants to puncture and immediately get the LinearCode
representation of his punctured code, he can thanks to this method. I plan to change it when the mechanism to get another representation of a punctured code (e.g. get a RS code from a punctured RS code) will be ready.
comment:5 Changed 6 years ago by
comment:6 Changed 6 years ago by
I think the default should always be to retain as much structure as possible. The C.punctured()
method is certainly the "default" a user would call.
Furthermore, having a PuncturedCode
should never get in the way for users who are simply interested in the unstructured linear code representation: it advertises exactly the same methods (plus some more). So I don't think there is any deprecation needed.
comment:7 Changed 6 years ago by
 Commit changed from 37683a13c904825582e9a4457a44ba0b70168a33 to 7f9d27ed9483b51342702e115a74e37904cde921
Branch pushed to git repo; I updated commit sha1. New commits:
7f9d27e  AbstractLinearCode's punctured method now returns a PuncturedCode object

comment:8 Changed 6 years ago by
Ok, I agree with you.
Changes have been made and pushed.
comment:9 Changed 6 years ago by
 Commit changed from 7f9d27ed9483b51342702e115a74e37904cde921 to ca859cbf3e6b86ccc218d6e51813ae45b9babb34
comment:10 Changed 6 years ago by
 Description modified (diff)
A few updates on this ticket:
 it's now under the latest beta
 as we now have the structure for decoders in Sage, I added a dedicated decoder for punctured codes
 as we now have GRS codes in Sage, which still are GRS codes after puncturing, I introduced a mechanism which allows one to get the most specific representation possible for a punctured code, based on the original code.
This is still open for review.
comment:11 Changed 6 years ago by
 Commit changed from ca859cbf3e6b86ccc218d6e51813ae45b9babb34 to 26aae1797b74fb3d27b11d73958884e013c1dc28
Branch pushed to git repo; I updated commit sha1. New commits:
26aae17  Fixed conflict after update to 7.1beta0

comment:12 Changed 6 years ago by
 Milestone changed from sage6.10 to sage7.1
I updated this ticket to latest beta, and fixed merge conflict. It's still open for review.
comment:13 Changed 6 years ago by
 Commit changed from 26aae1797b74fb3d27b11d73958884e013c1dc28 to fa355cd4cbc783dc0435ab62c90f52492af6e64f
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
8b654ac  Renamed method precompute and changed its doc

b60ca15  Rewrote documentation of partial_xgcd and made it a private method

5ed4ac3  Fixed error in errorerasure's decoding radius

03c15f3  Removed useless backslashes

cc3cb4d  Rewrote error message in KeyEquationSyndromeDecoder

a7f3ee1  Fixed typos

c8bc406  Rewrote some examples

7cbcca2  Rewrote some sentences

4aebb5a  Fixed bug related to full codes, plus some minor tweaks

fa355cd  Enhanced decoder documentation

comment:14 Changed 6 years ago by
 Dependencies set to #19653
I fixed a few typos, updated deprecated imports which were breaking doctests. I also completely rewrote doctests and documentation of the decoder, using GRS decoders introduced in #19653.
This is still open for review.
comment:15 Changed 6 years ago by
 Cc wdj ppurka added
comment:16 Changed 6 years ago by
 Commit changed from fa355cd4cbc783dc0435ab62c90f52492af6e64f to 5c61b119d82215569131ea6741f5abbc0ef2da2f
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
a074cb8  Changed my helper methods into private methods

6ad583f  Fixed bug in KeyEquation decoder's decode_to_code, added sanity checks, enhanced doctests

fca099e  Added new sanity check on the output of BW and Gao decoder

57dbfbf  Rewrote random_element method

7953d60  Proper sanity checks for output of decode_to_* methods

8673ac5  Optimized a bit polynomial division in Gao and BW

e1b6b09  Merged now postive reviewed #19666 and fixed conflicts

5093dce  Update to 7.1beta5

f3b4b11  Fixed typo in GuruswamiSudan doc

5c61b11  Merged dependecies and fixed conflicts

comment:17 Changed 6 years ago by
I updated this ticket to the latest beta and fixed conflicts. This is still open for review.
comment:18 Changed 6 years ago by
 Commit changed from 5c61b119d82215569131ea6741f5abbc0ef2da2f to 69da167f0a9d4a0fc68e0846416577a5ddb44eba
comment:19 Changed 6 years ago by
 Milestone changed from sage7.1 to sage7.2
I updated this ticket to 7.2beta1 and fixed a broken doctest. This is still open for review.
comment:20 Changed 6 years ago by
I start the review even though the ticket is not updated to the latest beta. Despite my comments, note that I realize that building this framework must have been a huge work, and that I found it really userfriendly =)
* General remarks *
 As I read the (quite long) name of the canonical decoder of your punctured code (
PuncturedCodeOriginalCodeDecoder
), I asked myself the following question: if I understand well your naming convention, decoders are namedCode
+DecoderName
. Is it necessary to have theCode
suffix ? My point is that it could be sufficient to only keepDecoderName
, because if a user wants to know what kind of code the decoder is applied to, he just needs to callDec.code()
.  You consider puncturing points as a list. Would it be better (and more confortable to compute unions and intersections) to consider them a set ?
* index.rst *
 why do you remove
sage/coding/source_coding/huffman
?
* src/sage/coding/grs.py *
 The method
_punctured_form(self, points)
doesn't work ifpoints
is not sorted (e.g.C_grs._punctured_form([4,3])
). That's due to your strange way to select the evaluation points and multipliers. I think you should remove the block:start = 0 for i in points: punctured_alphas += alphas[start:i] punctured_col_mults += col_mults[start:i] start = i + 1 punctured_alphas += alphas[start:n] punctured_col_mults += col_mults[start:n]
and replace it by something like (that's only a suggestion):
punctured_alphas = [ alphas[i] for i in range(n) if i not in points ] punctured_col_mults = [ col_mults[i] for i in range(n) if i not in points ]
 In the same method, if I understand it well, this block:
if G.rank() != dimension: G = G.echelon_form() for i in range(dimension): if G[i] == 0: dimension = 1
aims at computing the code dimension (stored in dimension
). In fact, you already computed it in G.rank()
.
* src/sage/coding/linear_code.py *
 in
_punctured_form
method,G.echelon_form()
puts all the zero lines at the bottom of the new matrix. Thus, if you want to get the nonzero part of the matrix, I think you could simply writek = G.rank() return LinearCode(G[:k])
comment:21 Changed 6 years ago by
 Status changed from needs_review to needs_work
* src/sage/coding/punctured_code.py *
 function puncture

 same strange way to compute the nonpunctured positions than in grs.py. For example, it doesn't work with:
C = codes.RandomLinearCode(11, 5, GF(7)) Cp = codes.PuncturedCode(C, [4,3]) v = vector(GF(7), (2,3,0,2,1,5,1,5,6,5,3)) sage.coding.punctured_code.puncture(v, [4,3], Cp)
 class PuncturedCode

 constructor:
positions = list(set(positions))
is a short (maybe dirty) way to do:
unique_positions = set() for i in positions: unique_positions.add(i) positions = [] for i in unique_positions: positions.append(i)
and it looks to sort the elements as well =) Also remark that you don't have this "multiplicity" issue when using Python sets (instead of lists) for storing puncturing points.
 in encode() method, you missed parentheses after
self.punctured_positions
inreturn puncture(c, self.punctured_positions, self)
 in structured_representation(self) method: in the doc, you wrote "A punctured GRS code is still a punctured code" but I think you meant "still a GRS code". Moreover, I don't understand what you mean by "Which is not the case for generic linear codes": punctured linear codes are still linear codes.
 still in structured_representation(self) method, I tried this:
sage: C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4) sage: P = codes.PuncturedCode(C, [1,3]) sage: Q = codes.PuncturedCode(P, [1,3]) sage: P.length() 5 sage: Q.length() 3 sage: P.structured_representation() [5, 4, 2] Generalized ReedSolomon Code over Finite Field of size 7 sage: Q.structured_representation() [5, 3, 3] Generalized ReedSolomon Code over Finite Field of size 7
So, double puncturing works well (Q
seems to be the right code), but structured_representation()
doesn't build the right ReedSolomon code.
 class PuncturedCodePuncturedMatrixEncoder

 in
GeneratorMatrix
: same comment as in linear_code.py, I think echelon_form()do the job and you just need to keep thek
first rows.
 class PuncturedCodeOriginalCodeDecoder

 sage is stuck (infinite loop ?) when I run this:
sage: R = codes.RandomLinearCode(11, 5, GF(7)) sage: P = codes.PuncturedCode(R, [1,3]) sage: P.decoder()
 l.444: you wrote
'random_values'
instead of
'randomvalues'
 in the constructor: you have a test variable
error_erasure
to which you assign0
and1
. MaybeTrue
andFalse
are more explicit.  in
decode_to_code
: l.600, I think you forgot to incrementshift
(otherwise I misunderstood the use ofshift
)  maybe add an
elif self._strategy == 'tryall'
at line 643.  l.670:
ValueError("The number of erasures exceed decoding capability")
> exceeds  l.672: decoding capability must be:
return (diff  punctured 1) // 2
 l.665; you could have negative radius with:
if self._strategy != 'tryall' and "errorerasure" not in D.decoder_type(): return D.decoding_radius()  punctured
 I don't understand the last
return D.decoding_radius()
, line 675. To which case does it apply ?
comment:22 Changed 6 years ago by
 Commit changed from 69da167f0a9d4a0fc68e0846416577a5ddb44eba to 659105c517f3e6115e70a9f502f3681ca9075086
Branch pushed to git repo; I updated commit sha1. New commits:
050d00d  Merge branch 'develop' into punctured_code

31fbb4f  positions is now a set instead of a list

e3f8f6b  Reinstated huffman in index.rst

564760f  Rewrote _punctured_form in grs.py

0746385  Changed method _punctured_form in linear_code.py

a27ca48  Rewrote method puncture in punctured_code

b0decac  Changed docstring in structured_representation

d16f13a  Simplified generator_matrix computation in the encoder

21b3570  Added generic decoders as available decoders for punctured codes

659105c  Some tweaking to punctured_code.py

comment:23 Changed 6 years ago by
 Status changed from needs_work to needs_review
Hello Julien,
Thanks for your thorough review!
I followed all your suggestions (good idea to change the set of points to a list!). Some remarks/answers follow:
 As
positions
is now a set, the test of duplicate positions in the constructor is no longer needed.  Open question: now that I changed
positions
to a set, thestr
methods return the ugly str format of a set:set([3,4])
for{3, 4}
. Which means the string representation of a punctured code now statesPuncturedCode coming from ... punctured in position(s) set([pos])
which is quite ugly... So, do you think I should keep that, or, only for thesestr
methods return a list while it might be a bit confusing for the user because he/she entered a set a got a list?
sage is stuck (infinite loop ?) when I run this:
Oh, yes, it's not an infinite loop! It's because it builds the original decoder of your code, namely a SyndromeDecoder
for a linear code over GF(7), length 11, dimension 5... Which takes a while because of the table of syndromes which is heavy to build.
Best,
David
comment:24 Changed 6 years ago by
I'd prefer it if the user could enter either a set or a list of punctured positions, or a single position. The internal representation as a set probably makes sense though. I consider str and repr as just prettyprinting, so formatting as a (sorted) list make sense.
comment:25 Changed 6 years ago by
 Commit changed from 659105c517f3e6115e70a9f502f3681ca9075086 to cef203fb5010d120386968f432648a60338c9fcc
Branch pushed to git repo; I updated commit sha1. New commits:
cef203f  Reinstated list as an allowed input type for punctured codes. Changed str methods for punctured codes.

comment:26 followup: ↓ 27 Changed 6 years ago by
Hello,
The user can now pass either an Integer, an int, a list or a set to PuncturedCode
. The internal representation is a set. I added a few extra sanitization tests, so if one passes a list/set of anything but ints/Integers, an exception is raised.
I also reinstated the list representation of the punctured positions for str methods.
Remark:
for now, punctured_positions
(the getter for positions
) always return a set, even if one passed an int/Integer/list at construction time. I think it's fine like that, but if you think it can be confusing/annoying for the user, I can change its behaviour so it returns the exact data one passed at construction time.
This is still open for review.
Best,
David
comment:27 in reply to: ↑ 26 Changed 6 years ago by
Awesome!
for now,
punctured_positions
(the getter forpositions
) always return a set, even if one passed an int/Integer/list at construction time. I think it's fine like that, but if you think it can be confusing/annoying for the user, I can change its behaviour so it returns the exact data one passed at construction time.
I think this makes complete sense.
comment:28 Changed 6 years ago by
 Commit changed from cef203fb5010d120386968f432648a60338c9fcc to 2d13b55e6603e014bc7c79d3852384f3a314677f
Branch pushed to git repo; I updated commit sha1. New commits:
2d13b55  Changed punctured method to support a list of vectors

comment:29 Changed 6 years ago by
Hello,
I fixed a bug: Punctured Code's decoder was not working if its original decoder was a list decoder.
I changed puncture
method (which is BTW now called _puncture
as it's a helper function): it can now puncture a list of vectors or a vector. This takes care of the listdecoding issue.
Best,
David
comment:30 Changed 6 years ago by
 Commit changed from 2d13b55e6603e014bc7c79d3852384f3a314677f to 5bb39feb29a9c8da1ea503b4a01527b312520aaa
Branch pushed to git repo; I updated commit sha1. New commits:
771f3fe  decoder_type works now properly on uninstantiated classes

0d8ed86  Merge branch 'develop' into decoder_type_method_for_uninstanciated_classes

c2777ff  Merge branch 'decoder_type_method_for_uninstanciated_classes' into talk_secret

021b841  Merge branch 'punctured_code' into talk_secret

5bb39fe  Fixed bug related to sets and sorting in decode_to_code

comment:31 Changed 6 years ago by
 Branch changed from u/dlucas/punctured_codes to u/dlucas/punctured_code
 Commit 5bb39feb29a9c8da1ea503b4a01527b312520aaa deleted
comment:32 Changed 6 years ago by
 Commit set to 36e71c57401929089bc1938aa016b251b54f8916
Branch pushed to git repo; I updated commit sha1. Last 10 new commits:
564760f  Rewrote _punctured_form in grs.py

0746385  Changed method _punctured_form in linear_code.py

a27ca48  Rewrote method puncture in punctured_code

b0decac  Changed docstring in structured_representation

d16f13a  Simplified generator_matrix computation in the encoder

21b3570  Added generic decoders as available decoders for punctured codes

659105c  Some tweaking to punctured_code.py

cef203f  Reinstated list as an allowed input type for punctured codes. Changed str methods for punctured codes.

2d13b55  Changed punctured method to support a list of vectors

36e71c5  Fixed bug related to sets and sorting in decode_to_code

comment:33 Changed 6 years ago by
Sorry, I messed up with my git branches in the push before that.
Anyway, my last commit fixes a bug related to decode_to_code
: the loops I was using to insert elements in lists were failing because they were expecting to receive the positions to insert as a sorted list of positions.
I wrote a helper functions to deal with this, and it now works fine.
This is still open for review (and this time, I cannot find any more issues).
Best,
David
comment:34 Changed 6 years ago by
 Status changed from needs_review to needs_work
Hi David,
A few remarks  the last ones, I hope:
 in
structured_representation
method: it doesn't work when one concatenates puncturings. For example, the following scriptC = codes.GeneralizedReedSolomonCode(GF(7).list(), 4) P = codes.PuncturedCode(C, [0,6]) D = P.structured_representation() P2 = codes.PuncturedCode(P, [0,4]) D2 = P2.structured_representation()
fails at fifth line (so double puncturing actually works, only structured_representation
crashes). From the crash log, it seems that you don't build the right set of pts
.
 in
PuncturedCodePuncturedMatrixEncoder
, one can construct this encoder while passing as argument a code which is not a punctured one. As a direct consequence, this scripts fails:C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4) E = codes.encoders.PuncturedCodePuncturedMatrixEncoder(C) G = E.generator_matrix()
The question is: is it the user's mistake, or should the code handle it ? Maybe it's reasonable to throw an error.
 same remark for your decoder: one can pass a nonpunctured code as parameter.
 in
__init__
function ofPuncturedCodeOriginalCodeDecoder
, maybe it is better to useTrue/False
boolean values instead oferror_erasure = 0
anderror_erasure = 1
.
 I also prefer the compact notation:
e_list = [ one if i in pts else zero for i in range(Cor.length()) ]
instead of:
e_list = [] for i in range(Cor.length()): if i in pts: e_list.append(one) else: e_list.append(zero)
 finally, a typo, l.441 : default instead of dafault
I did not perform exhaustive tests of all the combinations of codes/encoders/decoders/strategies, because it is too long. So given the complexity of the feature you're trying to implement, it's still possible there remains some minor errors  especially with extreme settings  but maybe a real deep use of this class (by actual users) could make them appear more easily.
Thus, when you're done fixing the previous errors, I give the green light.
Julien
comment:35 Changed 6 years ago by
 Commit changed from 36e71c57401929089bc1938aa016b251b54f8916 to 8a3a084f960197006447424522ea768c4a89bb62
comment:36 Changed 6 years ago by
Hello Julien,
Thanks for your comments!
I fixed the bug in structured_representation
. I also fixed a bug I found while compiling the documentation, added input sanitization in the encoder and the decoder (speaking of which, these checks are missing almost everywhere... I'll open a new ticket to fix that in other classes).
I also removed nonPythonic syntax (fun fact: four lines below the ones you noticed, error_erasure
was set to True
or False
... Consistency, I haz it).
Should be ok now.
FYI, I tested locally errorerasure
, listdecoder
and "classical" decoders, standard decoding tests passed.
David
comment:37 Changed 6 years ago by
 Milestone changed from sage7.2 to sage7.3
 Status changed from needs_work to needs_review
comment:38 Changed 6 years ago by
Hi David,
Your fix of structured_representation
doesn't work. Try this:
C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4) Cp = codes.PuncturedCode(C, [5,6]) Cp2 = codes.PuncturedCode(Cp, [1,2])
If you print list_pts
at the end of the method, you'll see it is not [1,2,5,6]
as expected.
Here is a piece of code which should work:
while(isinstance(C, PuncturedCode)): cur_pts = list(C.punctured_positions()) list_len = len(list_pts) for p in cur_pts: for i in range(list_len): if (p <= list_pts[i]): list_pts[i] += 1 list_pts += cur_pts C = C.original_code() return C._punctured_form(set(list_pts))
I'm not satisfied about its algorithmic complexity, so you're free to improve it, even though I cannot believe that this piece of code will ever become a bottleneck of any algorithm.
I agree with the other fixes :)
Julien
comment:39 Changed 6 years ago by
If you print
list_pts
at the end of the method, you'll see it is not[1,2,5,6]
as expected.
Oh yes you're right, my solution only works if one erases the same coordinates repeatedly...
comment:40 Changed 6 years ago by
 Commit changed from 8a3a084f960197006447424522ea768c4a89bb62 to 45d4ee111d235690820ef35269fb8991aa6f4abb
Branch pushed to git repo; I updated commit sha1. New commits:
45d4ee1  Fixed structured_representation

comment:41 Changed 6 years ago by
I took your solution and tested it on several instances. I'm fine with it, even if it's not perfect.
David
comment:42 Changed 6 years ago by
 Reviewers set to Julien Lavauzelle
 Status changed from needs_review to positive_review
Now that's ok for me. Great job again !
Julien
comment:43 Changed 6 years ago by
Amazing, thanks!
comment:44 Changed 6 years ago by
 Branch changed from u/dlucas/punctured_code to 45d4ee111d235690820ef35269fb8991aa6f4abb
 Resolution set to fixed
 Status changed from positive_review to closed
Content available, it's now ready for review.
New commits:
New Punctured Codes object, coming with its own encoder