Opened 4 years ago

Closed 3 years ago

#19422 closed enhancement (fixed)

A new structure for Punctured Codes

Reported by: dlucas Owned by:
Priority: major Milestone: sage-7.3
Component: coding theory Keywords:
Cc: wdj, ppurka Merged in:
Authors: David Lucas Reviewers: Julien Lavauzelle
Report Upstream: N/A Work issues:
Branch: 45d4ee1 (Commits) Commit: 45d4ee111d235690820ef35269fb8991aa6f4abb
Dependencies: #19653 Stopgaps:

Description (last modified by dlucas)

This ticket proposes a new implementation for punctured codes.

It contains:

  • a new code class, PuncturedCode
  • a dedicated encoder to compute the generator matrix of a Punctured Code.
  • a dedicate decoder which uses the original code to correct errors

This new structure presents the following advantages:

  • it keeps track of the code it comes from. If one punctures a code A, the punctured code one gets will remember it comes from A.
  • This allows to perform several operations, like encoding or picking a random element without constructing the generator matrix for the punctured code, thus saving time and memory.
  • because it keeps track of the original code, it's possible to get a structured representation of the punctured code, e.g. if one punctures a GRS code and asks for the structured representation of the punctured code one got, a GRS code will be returned.

Change History (44)

comment:1 Changed 4 years ago by dlucas

  • Branch set to u/dlucas/punctured_codes

comment:2 Changed 4 years ago by dlucas

  • Commit set to 37683a13c904825582e9a4457a44ba0b70168a33
  • Status changed from new to needs_review

Content available, it's now ready for review.


New commits:

37683a1New Punctured Codes object, coming with its own encoder

comment:3 Changed 4 years ago by jsrn

AbstractLinearCode has a method punctured. Shouldn't this return a PuncturedCode object after this patch?

comment:4 Changed 4 years ago by dlucas

I chose to keep this method as it directly returns a LinearCode object. So if one wants to puncture and immediately get the LinearCode representation of his punctured code, he can thanks to this method. I plan to change it when the mechanism to get another representation of a punctured code (e.g. get a RS code from a punctured RS code) will be ready.

comment:5 Changed 4 years ago by dlucas

  • Authors set to David Lucas

comment:6 Changed 4 years ago by jsrn

I think the default should always be to retain as much structure as possible. The C.punctured() method is certainly the "default" a user would call.

Furthermore, having a PuncturedCode should never get in the way for users who are simply interested in the unstructured linear code representation: it advertises exactly the same methods (plus some more). So I don't think there is any deprecation needed.

comment:7 Changed 4 years ago by git

  • Commit changed from 37683a13c904825582e9a4457a44ba0b70168a33 to 7f9d27ed9483b51342702e115a74e37904cde921

Branch pushed to git repo; I updated commit sha1. New commits:

7f9d27eAbstractLinearCode's punctured method now returns a PuncturedCode object

comment:8 Changed 4 years ago by dlucas

Ok, I agree with you.

Changes have been made and pushed.

comment:9 Changed 4 years ago by git

  • Commit changed from 7f9d27ed9483b51342702e115a74e37904cde921 to ca859cbf3e6b86ccc218d6e51813ae45b9babb34

Branch pushed to git repo; I updated commit sha1. New commits:

4473a2cUpdated to latest beta and fixed conflicts
74fc44eAdded decoder for punctured codes
ca859cbAdded a mechanism to get the structured representation of any punctured code

comment:10 Changed 4 years ago by dlucas

  • Description modified (diff)

A few updates on this ticket:

  • it's now under the latest beta
  • as we now have the structure for decoders in Sage, I added a dedicated decoder for punctured codes
  • as we now have GRS codes in Sage, which still are GRS codes after puncturing, I introduced a mechanism which allows one to get the most specific representation possible for a punctured code, based on the original code.

This is still open for review.

comment:11 Changed 4 years ago by git

  • Commit changed from ca859cbf3e6b86ccc218d6e51813ae45b9babb34 to 26aae1797b74fb3d27b11d73958884e013c1dc28

Branch pushed to git repo; I updated commit sha1. New commits:

26aae17Fixed conflict after update to 7.1beta0

comment:12 Changed 4 years ago by dlucas

  • Milestone changed from sage-6.10 to sage-7.1

I updated this ticket to latest beta, and fixed merge conflict. It's still open for review.

comment:13 Changed 4 years ago by git

  • Commit changed from 26aae1797b74fb3d27b11d73958884e013c1dc28 to fa355cd4cbc783dc0435ab62c90f52492af6e64f

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

8b654acRenamed method precompute and changed its doc
b60ca15Rewrote documentation of partial_xgcd and made it a private method
5ed4ac3Fixed error in error-erasure's decoding radius
03c15f3Removed useless backslashes
cc3cb4dRewrote error message in KeyEquationSyndromeDecoder
a7f3ee1Fixed typos
c8bc406Rewrote some examples
7cbcca2Rewrote some sentences
4aebb5aFixed bug related to full codes, plus some minor tweaks
fa355cdEnhanced decoder documentation

comment:14 Changed 4 years ago by dlucas

  • Dependencies set to #19653

I fixed a few typos, updated deprecated imports which were breaking doctests. I also completely rewrote doctests and documentation of the decoder, using GRS decoders introduced in #19653.

This is still open for review.

comment:15 Changed 4 years ago by jsrn

  • Cc wdj ppurka added

comment:16 Changed 4 years ago by git

  • Commit changed from fa355cd4cbc783dc0435ab62c90f52492af6e64f to 5c61b119d82215569131ea6741f5abbc0ef2da2f

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

a074cb8Changed my helper methods into private methods
6ad583fFixed bug in KeyEquation decoder's decode_to_code, added sanity checks, enhanced doctests
fca099eAdded new sanity check on the output of BW and Gao decoder
57dbfbfRewrote random_element method
7953d60Proper sanity checks for output of decode_to_* methods
8673ac5Optimized a bit polynomial division in Gao and BW
e1b6b09Merged now postive reviewed #19666 and fixed conflicts
5093dceUpdate to 7.1beta5
f3b4b11Fixed typo in Guruswami-Sudan doc
5c61b11Merged dependecies and fixed conflicts

comment:17 Changed 4 years ago by dlucas

I updated this ticket to the latest beta and fixed conflicts. This is still open for review.

comment:18 Changed 4 years ago by git

  • Commit changed from 5c61b119d82215569131ea6741f5abbc0ef2da2f to 69da167f0a9d4a0fc68e0846416577a5ddb44eba

Branch pushed to git repo; I updated commit sha1. New commits:

f2e8c28Merge branch 'develop' into punctured_code
69da167Fixed broken doctest

comment:19 Changed 4 years ago by dlucas

  • Milestone changed from sage-7.1 to sage-7.2

I updated this ticket to 7.2beta1 and fixed a broken doctest. This is still open for review.

comment:20 Changed 4 years ago by jlavauzelle

I start the review even though the ticket is not updated to the latest beta. Despite my comments, note that I realize that building this framework must have been a huge work, and that I found it really user-friendly =)

* General remarks *

  • As I read the (quite long) name of the canonical decoder of your punctured code (PuncturedCodeOriginalCodeDecoder), I asked myself the following question: if I understand well your naming convention, decoders are named Code+DecoderName. Is it necessary to have the Code suffix ? My point is that it could be sufficient to only keep DecoderName, because if a user wants to know what kind of code the decoder is applied to, he just needs to call Dec.code().
  • You consider puncturing points as a list. Would it be better (and more confortable to compute unions and intersections) to consider them a set ?

* index.rst *

  • why do you remove sage/coding/source_coding/huffman ?

* src/sage/coding/grs.py *

  • The method _punctured_form(self, points) doesn't work if points is not sorted (e.g. C_grs._punctured_form([4,3])). That's due to your strange way to select the evaluation points and multipliers. I think you should remove the block:
    start = 0
    for i in points:
        punctured_alphas += alphas[start:i]
        punctured_col_mults += col_mults[start:i]
        start = i + 1
        punctured_alphas += alphas[start:n]
        punctured_col_mults += col_mults[start:n]
    

and replace it by something like (that's only a suggestion):

punctured_alphas = [ alphas[i] for i in range(n) if i not in points ]
punctured_col_mults = [ col_mults[i] for i in range(n) if i not in points ]
  • In the same method, if I understand it well, this block:
    if G.rank() != dimension:
        G = G.echelon_form()
        for i in range(dimension):
            if G[i] == 0:
            dimension -= 1
    

aims at computing the code dimension (stored in dimension). In fact, you already computed it in G.rank().

* src/sage/coding/linear_code.py *

  • in _punctured_form method, G.echelon_form() puts all the zero lines at the bottom of the new matrix. Thus, if you want to get the non-zero part of the matrix, I think you could simply write
        k = G.rank()
        return LinearCode(G[:k])
    

comment:21 Changed 4 years ago by jlavauzelle

  • Status changed from needs_review to needs_work

* src/sage/coding/punctured_code.py * --- function puncture ---

  • same strange way to compute the non-punctured positions than in grs.py. For example, it doesn't work with:
    C = codes.RandomLinearCode(11, 5, GF(7))
    Cp = codes.PuncturedCode(C, [4,3])
    v = vector(GF(7), (2,3,0,2,1,5,1,5,6,5,3))
    sage.coding.punctured_code.puncture(v, [4,3], Cp)
    

--- class PuncturedCode ---

  • constructor:

positions = list(set(positions)) is a short (maybe dirty) way to do:

unique_positions = set()
for i in positions:
    unique_positions.add(i)
positions = []
for i in unique_positions:
   positions.append(i)

and it looks to sort the elements as well =) Also remark that you don't have this "multiplicity" issue when using Python sets (instead of lists) for storing puncturing points.

  • in encode() method, you missed parentheses after self.punctured_positions in return puncture(c, self.punctured_positions, self)
  • in structured_representation(self) method: in the doc, you wrote "A punctured GRS code is still a punctured code" but I think you meant "still a GRS code". Moreover, I don't understand what you mean by "Which is not the case for generic linear codes": punctured linear codes are still linear codes.
  • still in structured_representation(self) method, I tried this:
    sage: C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4)
    sage: P = codes.PuncturedCode(C, [1,3])
    sage: Q = codes.PuncturedCode(P, [1,3])
    sage: P.length()
    5
    sage: Q.length()
    3
    sage: P.structured_representation()
    [5, 4, 2] Generalized Reed-Solomon Code over Finite Field of size 7
    sage: Q.structured_representation()
    [5, 3, 3] Generalized Reed-Solomon Code over Finite Field of size 7
    

So, double puncturing works well (Q seems to be the right code), but structured_representation() doesn't build the right Reed-Solomon code.

--- class PuncturedCodePuncturedMatrixEncoder ---

  • in GeneratorMatrix: same comment as in linear_code.py, I think echelon_form()do the job and you just need to keep the k first rows.

--- class PuncturedCodeOriginalCodeDecoder ---

  • sage is stuck (infinite loop ?) when I run this:
    sage: R = codes.RandomLinearCode(11, 5, GF(7))
    sage: P = codes.PuncturedCode(R, [1,3])
    sage: P.decoder()
    
  • l.444: you wrote 'random_values' instead of 'random-values'
  • in the constructor: you have a test variable error_erasure to which you assign 0 and 1. Maybe True and False are more explicit.
  • in decode_to_code: l.600, I think you forgot to increment shift (otherwise I misunderstood the use of shift)
  • maybe add an elif self._strategy == 'try-all' at line 643.
  • l.670: ValueError("The number of erasures exceed decoding capability") --> exceeds
  • l.672: decoding capability must be:
        return (diff - punctured -1) // 2
    
  • l.665; you could have negative radius with:
    if self._strategy != 'try-all' and "error-erasure" not in D.decoder_type():
        return D.decoding_radius() - punctured
    
  • I don't understand the last return D.decoding_radius(), line 675. To which case does it apply ?

comment:22 Changed 4 years ago by git

  • Commit changed from 69da167f0a9d4a0fc68e0846416577a5ddb44eba to 659105c517f3e6115e70a9f502f3681ca9075086

Branch pushed to git repo; I updated commit sha1. New commits:

050d00dMerge branch 'develop' into punctured_code
31fbb4fpositions is now a set instead of a list
e3f8f6bReinstated huffman in index.rst
564760fRewrote _punctured_form in grs.py
0746385Changed method _punctured_form in linear_code.py
a27ca48Rewrote method puncture in punctured_code
b0decacChanged docstring in structured_representation
d16f13aSimplified generator_matrix computation in the encoder
21b3570Added generic decoders as available decoders for punctured codes
659105cSome tweaking to punctured_code.py

comment:23 Changed 4 years ago by dlucas

  • Status changed from needs_work to needs_review

Hello Julien,

Thanks for your thorough review!

I followed all your suggestions (good idea to change the set of points to a list!). Some remarks/answers follow:

  • As positions is now a set, the test of duplicate positions in the constructor is no longer needed.
  • Open question: now that I changed positions to a set, the str methods return the ugly str format of a set: set([3,4]) for {3, 4}. Which means the string representation of a punctured code now states PuncturedCode coming from ... punctured in position(s) set([pos]) which is quite ugly... So, do you think I should keep that, or, only for these str methods return a list while it might be a bit confusing for the user because he/she entered a set a got a list?

sage is stuck (infinite loop ?) when I run this:

Oh, yes, it's not an infinite loop! It's because it builds the original decoder of your code, namely a SyndromeDecoder for a linear code over GF(7), length 11, dimension 5... Which takes a while because of the table of syndromes which is heavy to build.

Best,

David

comment:24 Changed 4 years ago by jsrn

I'd prefer it if the user could enter either a set or a list of punctured positions, or a single position. The internal representation as a set probably makes sense though. I consider str and repr as just pretty-printing, so formatting as a (sorted) list make sense.

comment:25 Changed 4 years ago by git

  • Commit changed from 659105c517f3e6115e70a9f502f3681ca9075086 to cef203fb5010d120386968f432648a60338c9fcc

Branch pushed to git repo; I updated commit sha1. New commits:

cef203fReinstated list as an allowed input type for punctured codes. Changed str methods for punctured codes.

comment:26 follow-up: Changed 4 years ago by dlucas

Hello,

The user can now pass either an Integer, an int, a list or a set to PuncturedCode. The internal representation is a set. I added a few extra sanitization tests, so if one passes a list/set of anything but ints/Integers, an exception is raised.

I also reinstated the list representation of the punctured positions for str methods.

Remark:

for now, punctured_positions (the getter for positions) always return a set, even if one passed an int/Integer/list at construction time. I think it's fine like that, but if you think it can be confusing/annoying for the user, I can change its behaviour so it returns the exact data one passed at construction time.

This is still open for review.

Best,

David

comment:27 in reply to: ↑ 26 Changed 4 years ago by jsrn

Awesome!

for now, punctured_positions (the getter for positions) always return a set, even if one passed an int/Integer/list at construction time. I think it's fine like that, but if you think it can be confusing/annoying for the user, I can change its behaviour so it returns the exact data one passed at construction time.

I think this makes complete sense.

comment:28 Changed 4 years ago by git

  • Commit changed from cef203fb5010d120386968f432648a60338c9fcc to 2d13b55e6603e014bc7c79d3852384f3a314677f

Branch pushed to git repo; I updated commit sha1. New commits:

2d13b55Changed punctured method to support a list of vectors

comment:29 Changed 4 years ago by dlucas

Hello,

I fixed a bug: Punctured Code's decoder was not working if its original decoder was a list decoder. I changed puncture method (which is BTW now called _puncture as it's a helper function): it can now puncture a list of vectors or a vector. This takes care of the list-decoding issue.

Best,

David

comment:30 Changed 4 years ago by git

  • Commit changed from 2d13b55e6603e014bc7c79d3852384f3a314677f to 5bb39feb29a9c8da1ea503b4a01527b312520aaa

Branch pushed to git repo; I updated commit sha1. New commits:

771f3fedecoder_type works now properly on uninstantiated classes
0d8ed86Merge branch 'develop' into decoder_type_method_for_uninstanciated_classes
c2777ffMerge branch 'decoder_type_method_for_uninstanciated_classes' into talk_secret
021b841Merge branch 'punctured_code' into talk_secret
5bb39feFixed bug related to sets and sorting in decode_to_code

comment:31 Changed 4 years ago by dlucas

  • Branch changed from u/dlucas/punctured_codes to u/dlucas/punctured_code
  • Commit 5bb39feb29a9c8da1ea503b4a01527b312520aaa deleted

comment:32 Changed 4 years ago by git

  • Commit set to 36e71c57401929089bc1938aa016b251b54f8916

Branch pushed to git repo; I updated commit sha1. Last 10 new commits:

564760fRewrote _punctured_form in grs.py
0746385Changed method _punctured_form in linear_code.py
a27ca48Rewrote method puncture in punctured_code
b0decacChanged docstring in structured_representation
d16f13aSimplified generator_matrix computation in the encoder
21b3570Added generic decoders as available decoders for punctured codes
659105cSome tweaking to punctured_code.py
cef203fReinstated list as an allowed input type for punctured codes. Changed str methods for punctured codes.
2d13b55Changed punctured method to support a list of vectors
36e71c5Fixed bug related to sets and sorting in decode_to_code

comment:33 Changed 4 years ago by dlucas

Sorry, I messed up with my git branches in the push before that. Anyway, my last commit fixes a bug related to decode_to_code: the loops I was using to insert elements in lists were failing because they were expecting to receive the positions to insert as a sorted list of positions. I wrote a helper functions to deal with this, and it now works fine. This is still open for review (and this time, I cannot find any more issues).

Best,

David

comment:34 Changed 4 years ago by jlavauzelle

  • Status changed from needs_review to needs_work

Hi David,

A few remarks -- the last ones, I hope:

  • in structured_representation method: it doesn't work when one concatenates puncturings. For example, the following script
        C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4)
        P = codes.PuncturedCode(C, [0,6])
        D = P.structured_representation()
        P2 = codes.PuncturedCode(P, [0,4])
        D2 = P2.structured_representation()
    

fails at fifth line (so double puncturing actually works, only structured_representation crashes). From the crash log, it seems that you don't build the right set of pts.

  • in PuncturedCodePuncturedMatrixEncoder, one can construct this encoder while passing as argument a code which is not a punctured one. As a direct consequence, this scripts fails:
        C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4)
        E = codes.encoders.PuncturedCodePuncturedMatrixEncoder(C)
        G = E.generator_matrix()
    

The question is: is it the user's mistake, or should the code handle it ? Maybe it's reasonable to throw an error.

  • same remark for your decoder: one can pass a non-punctured code as parameter.
  • in __init__ function of PuncturedCodeOriginalCodeDecoder, maybe it is better to use True/False boolean values instead of error_erasure = 0 and error_erasure = 1.
  • I also prefer the compact notation:
        e_list = [ one if i in pts else zero for i in range(Cor.length()) ]
    

instead of:

    e_list = []
    for i in range(Cor.length()):
        if i in pts:
            e_list.append(one)
        else:
            e_list.append(zero)
  • finally, a typo, l.441 : default instead of dafault

I did not perform exhaustive tests of all the combinations of codes/encoders/decoders/strategies, because it is too long. So given the complexity of the feature you're trying to implement, it's still possible there remains some minor errors -- especially with extreme settings -- but maybe a real deep use of this class (by actual users) could make them appear more easily.

Thus, when you're done fixing the previous errors, I give the green light.

Julien

comment:35 Changed 3 years ago by git

  • Commit changed from 36e71c57401929089bc1938aa016b251b54f8916 to 8a3a084f960197006447424522ea768c4a89bb62

Branch pushed to git repo; I updated commit sha1. New commits:

ddedfc3Update to latest beta
f964812Fixed bug in index.rst file
6afe4a9Fixed bug related in structured_representation
890b309Added input check for encoder and decoder
8a3a084Tweaks: fixed typo, removed not Pythonic syntax

comment:36 Changed 3 years ago by dlucas

Hello Julien,

Thanks for your comments!

I fixed the bug in structured_representation. I also fixed a bug I found while compiling the documentation, added input sanitization in the encoder and the decoder (speaking of which, these checks are missing almost everywhere... I'll open a new ticket to fix that in other classes).

I also removed non-Pythonic syntax (fun fact: four lines below the ones you noticed, error_erasure was set to True or False... Consistency, I haz it).

Should be ok now. FYI, I tested locally error-erasure, list-decoder and "classical" decoders, standard decoding tests passed.

David

comment:37 Changed 3 years ago by dlucas

  • Milestone changed from sage-7.2 to sage-7.3
  • Status changed from needs_work to needs_review

comment:38 Changed 3 years ago by jlavauzelle

Hi David,

Your fix of structured_representation doesn't work. Try this:

    C = codes.GeneralizedReedSolomonCode(GF(7).list(), 4)
    Cp = codes.PuncturedCode(C, [5,6])
    Cp2 = codes.PuncturedCode(Cp, [1,2])

If you print list_pts at the end of the method, you'll see it is not [1,2,5,6] as expected.

Here is a piece of code which should work:

        while(isinstance(C, PuncturedCode)):
            cur_pts = list(C.punctured_positions())
            list_len = len(list_pts)
            for p in cur_pts:
                for i in range(list_len):
                    if (p <= list_pts[i]):
                        list_pts[i] += 1
            list_pts += cur_pts
            C = C.original_code()
        return C._punctured_form(set(list_pts))

I'm not satisfied about its algorithmic complexity, so you're free to improve it, even though I cannot believe that this piece of code will ever become a bottleneck of any algorithm.

I agree with the other fixes :)

Julien

comment:39 Changed 3 years ago by dlucas

If you print list_pts at the end of the method, you'll see it is not [1,2,5,6] as expected.

Oh yes you're right, my solution only works if one erases the same coordinates repeatedly...

comment:40 Changed 3 years ago by git

  • Commit changed from 8a3a084f960197006447424522ea768c4a89bb62 to 45d4ee111d235690820ef35269fb8991aa6f4abb

Branch pushed to git repo; I updated commit sha1. New commits:

45d4ee1Fixed structured_representation

comment:41 Changed 3 years ago by dlucas

I took your solution and tested it on several instances. I'm fine with it, even if it's not perfect.

David

comment:42 Changed 3 years ago by jlavauzelle

  • Reviewers set to Julien Lavauzelle
  • Status changed from needs_review to positive_review

Now that's ok for me. Great job again !

Julien

comment:43 Changed 3 years ago by dlucas

Amazing, thanks!

comment:44 Changed 3 years ago by vbraun

  • Branch changed from u/dlucas/punctured_code to 45d4ee111d235690820ef35269fb8991aa6f4abb
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.