Opened 7 years ago
Closed 7 years ago
#18813 closed enhancement (fixed)
New decoding structure for linear codes
Reported by:  David Lucas  Owned by:  

Priority:  major  Milestone:  sage6.10 
Component:  coding theory  Keywords:  
Cc:  Johan Rosenkilde, JeanPierre Flori, Vincent Delecroix, Luca De Feo, Clément Pernet  Merged in:  
Authors:  David Lucas  Reviewers:  Clément Pernet 
Report Upstream:  N/A  Work issues:  
Branch:  cf5ac32 (Commits, GitHub, GitLab)  Commit:  cf5ac32123eaff98d9620f6ce21afb9887a3c758 
Dependencies:  #18376  Stopgaps: 
Description (last modified by )
For now, linear codes don't have a proper structure for decoding, where "decoding" means using an errorcorrection algorithm to recover the original codeword.
We propose in this ticket a new structure, designed to handle word > codeword and word > message transformations.
We provide the following:
 An abstract class
Decoder
, for inheritances purposes. EveryDecoder
will inherit from this class. It contains: a default constructor;
decode_to_code
anddecode_to_message
default implementation; several methods to get information on the instance of the class and
 explanations on what to do to create a new decoder subclass.
 An exception class
DecodingFailure
, for errors related to decoding algorithms.
 Methods for decoding handling in
AbstractLinearCode
(see Design)
Design
What "Decoding" means depends on the algorithm used to find and correct the errors in a provided word. For instance, there are decoding algorithms able to locate and correct up to a certain amount of errors, and will fail if there is more errors than this bound. Others will return a list of codewords, which contain the original codeword. Some decoding algorithms are probabilistic, some are deterministic.
To describe the variety of behaviours, we introduce a list of "types", called _decoder_type
which is a set of keywords describing the semantics of the decoding algorithm represented by the class.
Decodig usually recovers codewords close to the provided words. Some decoding algorithms, however, naturally decode directly to the message from which the codeword was obtained. This is e.g. the case with the BerlekampWelch or the GuruswamiSudan algorithms. Since both uses are natural and in common use in theory and practice, we provide two different methods: decode_to_code and decode_to_message. The former corrects the errors and return an element from the code, while the latter corrects the errors and returns an element from a message space. Thanks to a default implementation, reimplmenting both is not needed. One only needs to override one of these two methods in a Decoder class, and the default implementation will be used to perform the other one.
As is discussed in #18376, a code can have many message spaces, and the map between a message space and the code is given by an Encoder. The decode_to_message
will pertain to a certain such Encoder
, so we introduce the string field _connected_encoder_name
to point to this encoder. A user can acquire the actual Encoder
object by calling Decoder.connected_encoder
.
Furthermore, we propose here the exact same structure of registration for decoders as the one we introduced in #18376, and the same handling of it on the level of codes: each code will have a default_decoder
, so if one does not want to be bothered by multiple decoding algorithms when he just wants to decode a word y
for his code C
, all he has to do is to write C.decode_to_code(y)
, and the default decoder will be used to correct the errors in y
.
With this structure, we are able to represent different decoding algorithms for each code, even if they have different behaviour. We also have a versatile structure which allows specific experimentation on chosen decoding algorithms alongside with generic decodings in which the algorithm does not matter.
Notes
This ticket relies on the encoding structure proposed on trac #18376, as every decoder has to be linked with an encoder.
The original decoding algorithms, which were in decoder.py are now in two different Decoder
objects in linear_code.py
. I removed the guava one as the call to guava's decoding algorithm returned a guava error.
Change History (42)
comment:1 Changed 7 years ago by
Description:  modified (diff) 

comment:2 Changed 7 years ago by
Branch:  → u/dlucas/decoder 

comment:3 Changed 7 years ago by
Authors:  → David Lucas 

Cc:  Johan Rosenkilde JeanPierre Flori Vincent Delecroix Luca De Feo added 
Commit:  → c2583d8334f5f9ff962a91a6e19d5978867a267c 
Dependencies:  → #18376 
Status:  new → needs_review 
comment:4 Changed 7 years ago by
Description:  modified (diff) 

comment:5 Changed 7 years ago by
Commit:  c2583d8334f5f9ff962a91a6e19d5978867a267c → ee33302e13d342dddbd0b7cdee9d5df035eb9a34 

comment:6 Changed 7 years ago by
Milestone:  sage6.8 → sage6.9 

comment:7 Changed 7 years ago by
I modified this tickets accordingly with reviewer's remarks in #18376, as these two are quite symmetrical. It's still ready for review.
comment:8 Changed 7 years ago by
Cc:  Clément Pernet added 

comment:9 Changed 7 years ago by
Commit:  ee33302e13d342dddbd0b7cdee9d5df035eb9a34 → 218b777cba67afadb97f6f3ed6564042528dfecc 

Branch pushed to git repo; I updated commit sha1. New commits:
218b777  Updated to 6.10beta0 and fixed some elements according to #18376

comment:10 Changed 7 years ago by
I merged with latest beta and resolved conflicts.
I also updated some things accordingly with what was said in #18376, fixed a typo in encoder and added forgotten GPL licence blocks in encoders_catalog
and decoders_catalog
.
Still pending for review.
comment:11 Changed 7 years ago by
Commit:  218b777cba67afadb97f6f3ed6564042528dfecc → d5bdb8ef1f7109d18c17736940aab92c8057d429 

comment:12 Changed 7 years ago by
Milestone:  sage6.9 → sage6.10 

I fixed a few things patchbot was complaining about. It'll probably continue to complain on coverage but I don't see the point of adding doctests on a deprecated function so if the reviewer agrees I'm going to ignore this warning. This is still pending for review.
comment:13 Changed 7 years ago by
Commit:  d5bdb8ef1f7109d18c17736940aab92c8057d429 → fa635145ff39b046cbb2b4c5525faf46f8deef29 

Branch pushed to git repo; I updated commit sha1. New commits:
fa63514  Fixed conflicts after merge

comment:14 Changed 7 years ago by
I updated to latest beta and fixed the conflicts that arose after that merge. It's ready for review again.
comment:15 Changed 7 years ago by
Commit:  fa635145ff39b046cbb2b4c5525faf46f8deef29 → 0f29ad3d1d09e1c3b7f26b1c6410076b5f0997f5 

Branch pushed to git repo; I updated commit sha1. New commits:
0f29ad3  Lazy imported linear_code in sd_codes

comment:16 Changed 7 years ago by
linear_code
module is now lazily imported in sd_codes
, so decoder
is no longer available in global namespace.
comment:17 Changed 7 years ago by
The semantic of the field: input_space
of a decoder is unclear.
It seems to me that it should corresponds to the ambient space of the associated code, and it is implemented this way line 246 of decoder.py or line 4028 of linear_code.py.
But line 75 of decoder.py, it is set to code.base_field()
.
Also a clear definition of what this field means should be added somewhere.
comment:18 Changed 7 years ago by
In the current state, the output of the hash function does not depend on the code, but only on its dimensions. (changing some coeffcients in the generating matrix G line 107 of decoder.py does not change the hash of the linear code and therefore does not change the hash of the decoder neither.
This remark is just to point out this inconsistency: either add a hash in encoder and in linear_code or remove the hash here.
comment:19 Changed 7 years ago by
line 183 of decoder.py: decode_to_message
decodes r
to the message space of self's connected_encoder (not code)
comment:20 Changed 7 years ago by
Overriding method decoding_radius
is optional, but documentation (line 41 of decoder.py) says it is required.
comment:21 Changed 7 years ago by
Commit:  0f29ad3d1d09e1c3b7f26b1c6410076b5f0997f5 → ddd1eb1f28e3d34dfc83a8ae401495b7cad432b8 

comment:22 Changed 7 years ago by
I added a clear definition of input_space
in the constructor of Decoder
in decoder.py
, fixed the errors you noticed and added a hash function to LinearCode
, with an extra doctest which check if the bug you found is fixed (see ll 39083911 in linear_code.py
).
comment:24 Changed 7 years ago by
minor spelling corrections in linear_code.py
 filling > filling in 4 occurences
 an > a lines 829 and 824
 lines 1613 and 1639: remove "test"
comment:25 Changed 7 years ago by
line 3904 of linear_code.py: differs > differ
In the same hash method: the XOR of hash(Str) twice is equivalent to applying the identity. One of them should be removed or replaced by hash(G) maybe. not sure to understand the motivation behind doing this XOR. hash((G,Str)) should do the job.
comment:26 Changed 7 years ago by
"XORing" hash(Str) twice is a mistake, it should (and will) be [...] ^ hash(G) ^ hash(Str)
.
Regarding the motivation behing the xor, I did as advised in Python doc crossed with this useful (imho) question on stack overflow
it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.
comment:27 Changed 7 years ago by
I'm confused by the syndrome method. It really looks to me as a "NearestNeighbor?" decoding, not as syndrome decoding.
Syndrome decoding means (following Roth's Book) 1/ compute the syndrome s=Hy where H is the parity check matrix 2/ Among the coset of vectors sharing the same syndrome, return the coset leader, with minimal Hamming weight
I don't see any parity check matrix here, and the output is simply the leader of the coset of all r+C vectors.
comment:28 Changed 7 years ago by
I definitely agree. But these methods were written before me, and all I did here was to make them fit in the new structure. Writing a true (and efficient) syndrome decoder is of course on the map, as a short term goal, but I think it does not belong to this ticket.
comment:30 Changed 7 years ago by
Commit:  ddd1eb1f28e3d34dfc83a8ae401495b7cad432b8 → 7c965dd5aeffc1008809f3d9143999fbee76dd5a 

Branch pushed to git repo; I updated commit sha1. New commits:
7c965dd  Integrated reviewer's comments

comment:31 Changed 7 years ago by
Last push fixes all your comments but the last one. I wait for the (potential) other remarks before writing another patch :)
comment:32 Changed 7 years ago by
Commit:  7c965dd5aeffc1008809f3d9143999fbee76dd5a → 68ab6b7bf4449c99b7fd69a0346dc176a1ecf8f8 

Branch pushed to git repo; I updated commit sha1. New commits:
68ab6b7  Fixed typo in decoder.py

comment:33 Changed 7 years ago by
As you just told me it was ok, here's the fix for the typo you noticed.
comment:34 Changed 7 years ago by
Commit:  68ab6b7bf4449c99b7fd69a0346dc176a1ecf8f8 → 51a78747358541d32a6520f767f9eed79bb3e068 

Branch pushed to git repo; I updated commit sha1. New commits:
51a7874  Fixed typo in syndrome method docstring

comment:35 Changed 7 years ago by
Status:  needs_review → positive_review 

All my remarks have been addressed. I'm giving this ticket a positive review.
comment:36 Changed 7 years ago by
Reviewers:  → cpernet 

comment:37 Changed 7 years ago by
Reviewers:  cpernet → Clément Pernet 

comment:38 Changed 7 years ago by
Status:  positive_review → needs_work 

Merge conflict, possibly with #16080
comment:39 Changed 7 years ago by
Commit:  51a78747358541d32a6520f767f9eed79bb3e068 → cf5ac32123eaff98d9620f6ce21afb9887a3c758 

Branch pushed to git repo; I updated commit sha1. New commits:
c9f3a98  trac #16080 changing imports of urllib,urrlib2,urlparse for py3 compatibility

ab05e06  trac #16080 correct one doctest

3ac77d3  trac #16080 move back 3 imports inside functions

99699f6  trac #16080 now redone using six.moves

f848c3b  trac #16080 correcting some imports

3098f5f  Merge branch 'public/16080' into 6.10.b5

cf5ac32  Fixed merge conflict with 16080

comment:40 Changed 7 years ago by
Status:  needs_work → needs_review 

I merged in #16080 and fixed the conflict in linear_code.py
.
This should do the trick!
comment:41 Changed 7 years ago by
Status:  needs_review → positive_review 

This merge works for me. Positive review.
comment:42 Changed 7 years ago by
Branch:  u/dlucas/decoder → cf5ac32123eaff98d9620f6ce21afb9887a3c758 

Resolution:  → fixed 
Status:  positive_review → closed 
New commits:
New abstract structure for decoders
Moved decoding methods from decoder.py to decoder classes in linear_code.py
Changes related to decoders in linear_code.py
Added a decoders_catalog file, added the import, added decoderrelated files to the index