Opened 4 years ago

Closed 3 years ago

#18813 closed enhancement (fixed)

New decoding structure for linear codes

Reported by: dlucas Owned by:
Priority: major Milestone: sage-6.10
Component: coding theory Keywords:
Cc: jsrn, jpflori, vdelecroix, defeo, cpernet Merged in:
Authors: David Lucas Reviewers: Clément Pernet
Report Upstream: N/A Work issues:
Branch: cf5ac32 (Commits) Commit: cf5ac32123eaff98d9620f6ce21afb9887a3c758
Dependencies: #18376 Stopgaps:

Description (last modified by dlucas)

For now, linear codes don't have a proper structure for decoding, where "decoding" means using an error-correction algorithm to recover the original codeword.

We propose in this ticket a new structure, designed to handle word -> codeword and word -> message transformations.

We provide the following:

  • An abstract class Decoder, for inheritances purposes. Every Decoder will inherit from this class. It contains:
    • a default constructor;
    • decode_to_code and decode_to_message default implementation;
    • several methods to get information on the instance of the class and
    • explanations on what to do to create a new decoder subclass.
  • An exception class DecodingFailure, for errors related to decoding algorithms.
  • Methods for decoding handling in AbstractLinearCode (see Design)

Design

What "Decoding" means depends on the algorithm used to find and correct the errors in a provided word. For instance, there are decoding algorithms able to locate and correct up to a certain amount of errors, and will fail if there is more errors than this bound. Others will return a list of codewords, which contain the original codeword. Some decoding algorithms are probabilistic, some are deterministic.

To describe the variety of behaviours, we introduce a list of "types", called _decoder_type which is a set of keywords describing the semantics of the decoding algorithm represented by the class.

Decodig usually recovers codewords close to the provided words. Some decoding algorithms, however, naturally decode directly to the message from which the codeword was obtained. This is e.g. the case with the Berlekamp-Welch or the Guruswami-Sudan algorithms. Since both uses are natural and in common use in theory and practice, we provide two different methods: decode_to_code and decode_to_message. The former corrects the errors and return an element from the code, while the latter corrects the errors and returns an element from a message space. Thanks to a default implementation, reimplmenting both is not needed. One only needs to override one of these two methods in a Decoder class, and the default implementation will be used to perform the other one.

As is discussed in #18376, a code can have many message spaces, and the map between a message space and the code is given by an Encoder. The decode_to_message will pertain to a certain such Encoder, so we introduce the string field _connected_encoder_name to point to this encoder. A user can acquire the actual Encoder object by calling Decoder.connected_encoder.

Furthermore, we propose here the exact same structure of registration for decoders as the one we introduced in #18376, and the same handling of it on the level of codes: each code will have a default_decoder, so if one does not want to be bothered by multiple decoding algorithms when he just wants to decode a word y for his code C, all he has to do is to write C.decode_to_code(y), and the default decoder will be used to correct the errors in y.

With this structure, we are able to represent different decoding algorithms for each code, even if they have different behaviour. We also have a versatile structure which allows specific experimentation on chosen decoding algorithms alongside with generic decodings in which the algorithm does not matter.

Notes

This ticket relies on the encoding structure proposed on trac #18376, as every decoder has to be linked with an encoder.

The original decoding algorithms, which were in decoder.py are now in two different Decoder objects in linear_code.py. I removed the guava one as the call to guava's decoding algorithm returned a guava error.

Change History (42)

comment:1 Changed 4 years ago by dlucas

  • Description modified (diff)

comment:2 Changed 4 years ago by dlucas

  • Branch set to u/dlucas/decoder

comment:3 Changed 4 years ago by dlucas

  • Authors set to David Lucas
  • Cc jsrn jpflori vdelecroix defeo added
  • Commit set to c2583d8334f5f9ff962a91a6e19d5978867a267c
  • Dependencies set to #18376
  • Status changed from new to needs_review

New commits:

d79c19bNew abstract structure for decoders
d7d9520Moved decoding methods from decoder.py to decoder classes in linear_code.py
19193deChanges related to decoders in linear_code.py
c2583d8Added a decoders_catalog file, added the import, added decoder-related files to the index

comment:4 Changed 4 years ago by dlucas

  • Description modified (diff)

comment:5 Changed 4 years ago by git

  • Commit changed from c2583d8334f5f9ff962a91a6e19d5978867a267c to ee33302e13d342dddbd0b7cdee9d5df035eb9a34

Branch pushed to git repo; I updated commit sha1. New commits:

f286a64Update to 6.9beta4
ee33302Propaged comments got from #18376

comment:6 Changed 4 years ago by dlucas

  • Milestone changed from sage-6.8 to sage-6.9

comment:7 Changed 4 years ago by dlucas

I modified this tickets accordingly with reviewer's remarks in #18376, as these two are quite symmetrical. It's still ready for review.

comment:8 Changed 4 years ago by cpernet

  • Cc cpernet added

comment:9 Changed 4 years ago by git

  • Commit changed from ee33302e13d342dddbd0b7cdee9d5df035eb9a34 to 218b777cba67afadb97f6f3ed6564042528dfecc

Branch pushed to git repo; I updated commit sha1. New commits:

218b777Updated to 6.10beta0 and fixed some elements according to #18376

comment:10 Changed 4 years ago by dlucas

I merged with latest beta and resolved conflicts. I also updated some things accordingly with what was said in #18376, fixed a typo in encoder and added forgotten GPL licence blocks in encoders_catalog and decoders_catalog.

Still pending for review.

comment:11 Changed 4 years ago by git

  • Commit changed from 218b777cba67afadb97f6f3ed6564042528dfecc to d5bdb8ef1f7109d18c17736940aab92c8057d429

Branch pushed to git repo; I updated commit sha1. New commits:

431183dUpdated to 6.10beta1
d5bdb8eFixed broken doctests and added one docstring

comment:12 Changed 4 years ago by dlucas

  • Milestone changed from sage-6.9 to sage-6.10

I fixed a few things patchbot was complaining about. It'll probably continue to complain on coverage but I don't see the point of adding doctests on a deprecated function so if the reviewer agrees I'm going to ignore this warning. This is still pending for review.

comment:13 Changed 4 years ago by git

  • Commit changed from d5bdb8ef1f7109d18c17736940aab92c8057d429 to fa635145ff39b046cbb2b4c5525faf46f8deef29

Branch pushed to git repo; I updated commit sha1. New commits:

fa63514Fixed conflicts after merge

comment:14 Changed 4 years ago by dlucas

I updated to latest beta and fixed the conflicts that arose after that merge. It's ready for review again.

comment:15 Changed 4 years ago by git

  • Commit changed from fa635145ff39b046cbb2b4c5525faf46f8deef29 to 0f29ad3d1d09e1c3b7f26b1c6410076b5f0997f5

Branch pushed to git repo; I updated commit sha1. New commits:

0f29ad3Lazy imported linear_code in sd_codes

comment:16 Changed 4 years ago by dlucas

linear_code module is now lazily imported in sd_codes, so decoder is no longer available in global namespace.

comment:17 Changed 3 years ago by cpernet

The semantic of the field: input_space of a decoder is unclear. It seems to me that it should corresponds to the ambient space of the associated code, and it is implemented this way line 246 of decoder.py or line 4028 of linear_code.py. But line 75 of decoder.py, it is set to code.base_field().

Also a clear definition of what this field means should be added somewhere.

comment:18 Changed 3 years ago by cpernet

In the current state, the output of the hash function does not depend on the code, but only on its dimensions. (changing some coeffcients in the generating matrix G line 107 of decoder.py does not change the hash of the linear code and therefore does not change the hash of the decoder neither.

This remark is just to point out this inconsistency: either add a hash in encoder and in linear_code or remove the hash here.

comment:19 Changed 3 years ago by cpernet

line 183 of decoder.py: decode_to_message decodes r to the message space of self's connected_encoder (not code)

comment:20 Changed 3 years ago by cpernet

Overriding method decoding_radius is optional, but documentation (line 41 of decoder.py) says it is required.

comment:21 Changed 3 years ago by git

  • Commit changed from 0f29ad3d1d09e1c3b7f26b1c6410076b5f0997f5 to ddd1eb1f28e3d34dfc83a8ae401495b7cad432b8

Branch pushed to git repo; I updated commit sha1. New commits:

b13623eUpdate to latest beta
ddd1eb1Integrated reviewer's comments

comment:22 Changed 3 years ago by dlucas

I added a clear definition of input_space in the constructor of Decoder in decoder.py, fixed the errors you noticed and added a hash function to LinearCode, with an extra doctest which check if the bug you found is fixed (see ll 3908-3911 in linear_code.py).

comment:23 Changed 3 years ago by cpernet

linear_code.py : lines 878 and 883 are redundant.

comment:24 Changed 3 years ago by cpernet

minor spelling corrections in linear_code.py

  • filling -> filling in 4 occurences
  • an -> a lines 829 and 824
  • lines 1613 and 1639: remove "test"

comment:25 Changed 3 years ago by cpernet

line 3904 of linear_code.py: differs -> differ

In the same hash method: the XOR of hash(Str) twice is equivalent to applying the identity. One of them should be removed or replaced by hash(G) maybe. not sure to understand the motivation behind doing this XOR. hash((G,Str)) should do the job.

comment:26 Changed 3 years ago by dlucas

"XORing" hash(Str) twice is a mistake, it should (and will) be [...] ^ hash(G) ^ hash(Str).

Regarding the motivation behing the xor, I did as advised in Python doc crossed with this useful (imho) question on stack overflow

it is advised to somehow mix together (e.g. using exclusive or) the hash values for the components of the object that also play a part in comparison of objects.

comment:27 Changed 3 years ago by cpernet

I'm confused by the syndrome method. It really looks to me as a "NearestNeighbor?" decoding, not as syndrome decoding.

Syndrome decoding means (following Roth's Book) 1/ compute the syndrome s=Hy where H is the parity check matrix 2/ Among the coset of vectors sharing the same syndrome, return the coset leader, with minimal Hamming weight

I don't see any parity check matrix here, and the output is simply the leader of the coset of all r+C vectors.

+ fix the docstring "Returns the all ..."

Last edited 3 years ago by cpernet (previous) (diff)

comment:28 Changed 3 years ago by dlucas

I definitely agree. But these methods were written before me, and all I did here was to make them fit in the new structure. Writing a true (and efficient) syndrome decoder is of course on the map, as a short term goal, but I think it does not belong to this ticket.

comment:29 Changed 3 years ago by cpernet

decoder.py line 184: connectoed -> connected

comment:30 Changed 3 years ago by git

  • Commit changed from ddd1eb1f28e3d34dfc83a8ae401495b7cad432b8 to 7c965dd5aeffc1008809f3d9143999fbee76dd5a

Branch pushed to git repo; I updated commit sha1. New commits:

7c965ddIntegrated reviewer's comments

comment:31 Changed 3 years ago by dlucas

Last push fixes all your comments but the last one. I wait for the (potential) other remarks before writing another patch :)

comment:32 Changed 3 years ago by git

  • Commit changed from 7c965dd5aeffc1008809f3d9143999fbee76dd5a to 68ab6b7bf4449c99b7fd69a0346dc176a1ecf8f8

Branch pushed to git repo; I updated commit sha1. New commits:

68ab6b7Fixed typo in decoder.py

comment:33 Changed 3 years ago by dlucas

As you just told me it was ok, here's the fix for the typo you noticed.

comment:34 Changed 3 years ago by git

  • Commit changed from 68ab6b7bf4449c99b7fd69a0346dc176a1ecf8f8 to 51a78747358541d32a6520f767f9eed79bb3e068

Branch pushed to git repo; I updated commit sha1. New commits:

51a7874Fixed typo in syndrome method docstring

comment:35 Changed 3 years ago by cpernet

  • Status changed from needs_review to positive_review

All my remarks have been addressed. I'm giving this ticket a positive review.

comment:36 Changed 3 years ago by cpernet

  • Reviewers set to cpernet

comment:37 Changed 3 years ago by cpernet

  • Reviewers changed from cpernet to Clément Pernet

comment:38 Changed 3 years ago by vbraun

  • Status changed from positive_review to needs_work

Merge conflict, possibly with #16080

comment:39 Changed 3 years ago by git

  • Commit changed from 51a78747358541d32a6520f767f9eed79bb3e068 to cf5ac32123eaff98d9620f6ce21afb9887a3c758

Branch pushed to git repo; I updated commit sha1. New commits:

c9f3a98trac #16080 changing imports of urllib,urrlib2,urlparse for py3 compatibility
ab05e06trac #16080 correct one doctest
3ac77d3trac #16080 move back 3 imports inside functions
99699f6trac #16080 now redone using six.moves
f848c3btrac #16080 correcting some imports
3098f5fMerge branch 'public/16080' into 6.10.b5
cf5ac32Fixed merge conflict with 16080

comment:40 Changed 3 years ago by dlucas

  • Status changed from needs_work to needs_review

I merged in #16080 and fixed the conflict in linear_code.py. This should do the trick!

comment:41 Changed 3 years ago by cpernet

  • Status changed from needs_review to positive_review

This merge works for me. Positive review.

comment:42 Changed 3 years ago by vbraun

  • Branch changed from u/dlucas/decoder to cf5ac32123eaff98d9620f6ce21afb9887a3c758
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.