Opened 7 years ago
Closed 7 years ago
#18269 closed enhancement (fixed)
A new structure for experimentation on decoding: communication channels
Reported by: | dlucas | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-6.7 |
Component: | coding theory | Keywords: | |
Cc: | jsrn | Merged in: | |
Authors: | David Lucas | Reviewers: | Vincent Delecroix |
Report Upstream: | N/A | Work issues: | |
Branch: | 383e67c (Commits, GitHub, GitLab) | Commit: | 383e67c01119ce17f0ab58fd0833016a9bcdad01 |
Dependencies: | Stopgaps: |
Description
For now, there is no structure to easily add errors in codewords. If one wants to experiment with decoding algorithms on codes, the only possible way is to add errors "by hand", which is rather tedious.
We propose here a new structure, based on communication channels, on purpose to facilitate the experimentation process with decoding algorithms.
For now, our structure consists of:
- an abstract class for channels,
- a channel which adds a specific number of errors at random positions to each provided vector
- a channel which adds a specific number of errors at random positions to each provided vector, and erases a specific number of random positions
With this new structure, creating n
errors in a vector can be done in one line of code into Sage.
Adding a new Channel class should also be easy: all one needs to do is to inherit from the abstract class, and override and implement a method.
For better consistency, channels can only be accessed using channels.<name> (see #15445) from the global namespace.
Change History (32)
comment:1 Changed 7 years ago by
- Branch set to u/dlucas/channels
comment:2 Changed 7 years ago by
- Commit set to 62a2447c45d3d2878f2da037ea4721464bef2ea4
- Status changed from new to needs_review
comment:3 Changed 7 years ago by
- Status changed from needs_review to needs_work
Hello,
- You might consider inheriting from
SageObject
(sage.structure.sage_object
).
- I would rather hide them under
codes.channels.<name>
. That way you can avoid the filechannels_catalog
.
- Why
sage.coding.channel_constructions
and notsage.coding.channels
?
- Why not setting
__call__
as an alias oftransmit
. That would allowsage: channel(msg)
which is convenient.
- Would be cool to have cross references with actual codes in the documentation. Right now the documentation looks like this is an isolated block.
- You could add various tests to check that if you reset the seed you get the same answer. Might be useful to have this doctested and also for users to know that they can use it.
sage: set_random_seed(10) sage: randint(0,5) 5 sage: set_random_seed(10) sage: randint(0,5) 5 sage: set_random_seed(10) sage: randint(0,5) 5
- In
_random_error_vector
why not starting withvect = [F.random_element() for _ in range(n)]
- The functions
_random_error_position
,_random_error_vector
and_tuple_to_integer
will not appear in the documentation. You should either remove the_
at the begining or add someautomethod
sphinx directives (you have to look in the code to find how to do it)
- The function
_random_error_position
should be moved toprandom
. It is useful in its own and quite independent. Note that it is quite similar tosage: sample(range(10), 4) [1, 7, 6, 4]
- For
_random_error_vector
I would rather useF._nonzero_random_element()
that is useful in your casesage: GF(2)._random_nonzero_element() 1
of course the default implementation is to call.random_element
until something nonzero is found. Hence you can use
- Why do you need this
_tuple_to_integer
? Why not using directlyrandint(*value)
?sage: value = [0,5] sage: randint(*value) 3 sage: value = (0,5) sage: randint(*value) 4
Eleven is a nice prime number. So I will stop there.
Vincent
comment:4 Changed 7 years ago by
Hello,
Thank you for the input!
I would rather hide them under
codes.channels.<name>
. That way you can avoid the filechannels_catalog
. Whysage.coding.channel_constructions
and notsage.coding.channels
?
My idea here was to mimic the way codes were stored and hide: a code_constructions file for the codes, and a codes_catalog file for the list of codes. I wanted to do the same with channels for consistency. That's why I chose the name channel_constructions. Hiding the channels under codes.channels.<name> might be confusing (imho): one can think that channels depends on codes, which is not the case as they are completely independant objects. So they can be used on any vector, not only on codewords, or for experimentation on decoding. Which also explains why the documentation looks like an isolated block, I wrote it as a new object, independant from codes.
The functions
_random_error_position
,_random_error_vector
and_tuple_to_integer
will not appear in the documentation. You should either remove the_
at the begining or add someautomethod
sphinx directives (you have to look in the code to find how to do it)
Well, that was done on purpose as one should not use these methods except when implementing a new Channel object. Now, with a second thought about this, it's not that smart as one will have to browse through the code to find these methods. I think I'll remove the underscore to make them appear in the doc, while putting the statement "This is a helper function, for internal use only." in a ..NOTE::
block to highlight this.
For the rest of your comments, thanks a lot for the advice on these random
issues. I'll change the code accordingly.
David
comment:5 Changed 7 years ago by
- Commit changed from 62a2447c45d3d2878f2da037ea4721464bef2ea4 to 874deac06298703bc9cd9d8354302830739d6089
comment:6 Changed 7 years ago by
random_error_position
is not an appropriate name. Perhaps range_sample
or sample_range
. The description is not appropriate as well. Should be something like
def sample_range(n, k): """ Return a subset of [0,...,n-1] of size k with uniform probability. INPUT: - ``n``, ``k`` - integers OUTPUT: An ordered list. AUTHORS: This function is taken from codinglib (https://bitbucket.org/jsrn/codinglib/) written by Johan Nielsen. """
In the same function you should also:
- check the input (just do
n = int(n); k = int(k)
) - rename the variable
error_position
into something more meaningful
And for the line sage:set_random_seed(10)
there must be a space sage: set_random_seed(10)
.
comment:7 Changed 7 years ago by
I did some changes, as stated by commit messages above.
Some remarks:
- I kept
tuple_to_integer
because the user can pass either an integer or an interval asnumber_errors
parameter for the channels. If the user passes an integer, thenrandint
won't work. Of course it is possible to do the check anytime you callnumber_errors
(if integer, return it, else do a randint and return the result) but it's rather tedious. Especially because it's something that will be used by a some channels. I think it's easier to call a single method rather than copying theif
statement each time. Maybetuple_to_integer
is not really a good name though.
random_error_vector(n, F, error_positions)
returns a vector of lengthn
overF
with0
s in every position except those listed inerror_positions
. Because of this, callingvect = [F.random_element() for _ in range(n)]
inrandom_error_vector
does not really fit, because it will create a vector with non-controlled zero positions, so we'll need to set all positions except those intoerror_positions
afterwards. I think create a zero vector of sizen
and add random values at some positions is better.
- I did not change the documentation, nor the catalog stuff (see comment 4). If you think that Channels are actually more linked to codes than to anything else, I can make some changes.
David
comment:8 Changed 7 years ago by
- Commit changed from 874deac06298703bc9cd9d8354302830739d6089 to 5f31c5cef52b9b0bdbd905f4ff132e344446e766
Branch pushed to git repo; I updated commit sha1. New commits:
5f31c5c | Changed name of random_error_position to sample_range, updated documentation, propaged naiming change. Removed some whitespaces
|
comment:9 Changed 7 years ago by
- Status changed from needs_work to needs_info
New commits:
5f31c5c | Changed name of random_error_position to sample_range, updated documentation, propaged naiming change. Removed some whitespaces
|
comment:10 follow-up: ↓ 12 Changed 7 years ago by
Replying to dlucas:
I did some changes, as stated by commit messages above.
Some remarks:
- I kept
tuple_to_integer
because the user can pass either an integer or aninterval as
number_errors
parameter for the channels. If the user passes an integer, thenrandint
won't work. Of course it is possible to do the check anytime you callnumber_errors
(if integer, return it, else do a randint and return the result) but it's rather tedious. Especially because it's something that will be used by a some channels. I think it's easier to call a single method rather than copying theif
statement each time. Maybetuple_to_integer
is not really a good name though.
The name is very bad. Moreover you can convert an integer a
into the tuple (a,a)
in the constructor. With your way of doing it you avoid the function call to randint
but with your version you always add an extra function call to tuple_to_integer
.
You should replace all occurrence of hasattr(X, '__iter__')
by isinstance(X, (tuple, list))
. Having an __iter__
method does not charaterize tuples.
random_error_vector(n, F, error_positions)
returns a vector of lengthn
over
F
with0
s in every position except those listed inerror_positions
. Because of this, calling `vect = [F.random_element() for _ in range(n)]in
random_error_vector` does not really fit, because it will create a vector with non-controlled zero positions, so we'll need to set all positions except those intoerror_positions
afterwards. I think create a zero vector of sizen
and add random values at some positions is better.
Yes. You are right!
- I did not change the documentation, nor the catalog stuff (see comment 4).
If you think that Channels are actually more linked to codes than to anything else, I can make some changes
You should tell me ;-) Cryptogtaphers might like it but they do use strings not vectors in a vector space over a finite field.
- When you do
__call__ = transmit
the documentation is copied as well. Did you trysage: Chan = channels.StaticErrorRateChannel(GF(11), 2) sage: Chan.__call__?
- I would rather implement
StaticErrorRateChannel.transmit_unsafe
asdef transmit_unsafe(self, msg): w = msg.copy() V = self.input_space() for i in sample_range(V.dim(), randint(*self.nb_errors())): w[i] = V.random_element() return w
it would be more direct. Moreover it shows thatsample_range
would better be an iterator instead of returning a list. By the way, this functionsample_range
has currently linear complexity inn
, isn't it possible to make it linear ink
?
- In the main documentation class, it is useful to document the main methods. In other words, when reading
sage: channels.StaticErrorChannel?
I should be able to discvoer the methodtransmit
(and its alias usage with__call__
).
- In the very top documentation, you can have links by writing
class:`AbstractChannel`
instead of*AbstractChannel*
.
Vincent
comment:11 Changed 7 years ago by
- Status changed from needs_info to needs_work
comment:12 in reply to: ↑ 10 ; follow-up: ↓ 13 Changed 7 years ago by
- I would rather implement
StaticErrorRateChannel.transmit_unsafe
asdef transmit_unsafe(self, msg): w = msg.copy() V = self.input_space() for i in sample_range(V.dim(), randint(*self.nb_errors())): w[i] = V.random_element() return wit would be more direct. Moreover it shows thatsample_range
would better be an iterator instead of returning a list. By the way, this functionsample_range
has currently linear complexity inn
, isn't it possible to make it linear ink
?
Yes it is (the solution is not exactly linear though)
sage: S = Subsets(100,4) sage: timeit("S.random_element()") 625 loops, best of 3: 54.2 µs per loop sage: timeit("sample_range(100,4)") 625 loops, best of 3: 57.7 µs per loop sage: S = Subsets(1000,4) sage: timeit("S.random_element()") 625 loops, best of 3: 61.2 µs per loop sage: timeit("sample_range(1000,4)") 625 loops, best of 3: 510 µs per loop
It is roughly the same speed for n=100
but for n=1000
yours is twice slower.
comment:13 in reply to: ↑ 12 Changed 7 years ago by
Replying to vdelecroix:
Moreover it shows that
sample_range
would better be an iterator instead of returning a list. By the way, this functionsample_range
has currently linear complexity inn
, isn't it possible to make it linear ink
?Yes it is (the solution is not exactly linear though) ... It is roughly the same speed for
n=100
but forn=1000
yours is twice slower.
This code for subsets is just using
sage: sample(range(n), k)
So I guess this is currently the best way to do it. And moreover, looking at Python doc
To choose a sample in a range of integers, use xrange as an argument. This is especially fast and space efficient for sampling from a large population: sample(xrange(10000000), 60)
Sorry for my previous remarks. You should just get rid of sample_range(n,k)
and use sample(xrange(n),k)
.
Vincent
comment:14 Changed 7 years ago by
- Commit changed from 5f31c5cef52b9b0bdbd905f4ff132e344446e766 to f93852e00d00dd694bed6c068d87cd17ad1a5d3c
comment:15 Changed 7 years ago by
- Status changed from needs_work to needs_review
Thank you again for the advice!
The only thing I did not do is to add cross-references between channels and codes. I think it's not the best time to do it, as there is no proper Encoder/Decoder? structure. But when we will propose our structure for decoding, I think it will be interesting to illustrate the error correction using channels to create errors. Before that, I let it as is.
comment:16 Changed 7 years ago by
- Status changed from needs_review to needs_work
Hello,
- I do not like this
AbstractChannel
. If you lookMorphism
the top (abstract) class is notAbstractMorphism
butMorphism
(seesage.categories.morphism.Morphism
).
- In my remark 12 I was suggesting to actually remove
def __call__(self, msg): return self.transmit(msg)
and keep only__call__ = transmit
It avoids duplication of the documentation.
- You can remove the
raise NotImplementedError
in the abstract method. By constructionsage: class A: ....: @abstract_method ....: def ploum_ploum(self): ....: "a hello function!" sage: a = A() sage: a.ploum_ploum() Traceback (most recent call last): ... NotImplementedError: <abstract method ploum_ploum at 0x7f1db2e02758>
- In
ValueError: The total number of errors and erasures can exceed the dimension of the input space
shouldn't it becan not
?
- Do you really care about equality (i.e. the method
__eq__
)? If you do not then remove the method. If you do then you must also implement__ne__
(and possibly__hash__
). Otherwisesage: class A: ....: def __eq__(self, other): ....: return True sage: a = A() sage: b = A() sage: a == b True sage: a != b True
Further remarks for later:
- If you look at the table of contents of coding you have
- Index of Channels - Channels - Index of Codes - Linear Codes ...
Not very nice. Though not for this ticket. Think about it for later on the table of contents of code documentation is a mess! But of course the priority is to clean the code.
- A more global remark: it is generally a bad strategy to implement an isolated block of code and then to try to mix things later. Especially if the code in itself is basically useless.
Vincent
comment:17 Changed 7 years ago by
- Commit changed from f93852e00d00dd694bed6c068d87cd17ad1a5d3c to 2bc05906eb2db277bac427ec2ea55a47f95f1f26
comment:18 follow-up: ↓ 19 Changed 7 years ago by
- Status changed from needs_work to needs_review
If you look at the table of contents of coding you have [...] Not very nice. Though not for this ticket. Think about it for later on the table of contents of code documentation is a mess! But of course the priority is to clean the code.
I agree. It's definitely something to keep in mind.
A more global remark: it is generally a bad strategy to implement an isolated block of code and then to try to mix things later. Especially if the code in itself is basically useless.
Got it!
I did the suggested changes in the code. I removed __eq__
, as I do not really picture a case in which one would need to compare two channels.
David
comment:19 in reply to: ↑ 18 Changed 7 years ago by
Replying to dlucas:
I removed
__eq__
, as I do not really picture a case in which one would need to compare two channels.
Hence my question: it will really depend on its usage!
Vincent
comment:20 Changed 7 years ago by
- Reviewers set to Vincent Delecroix
- Status changed from needs_review to needs_work
Hello,
For Sage objects, you should use _repr_
and not __repr__
. It is useful for example if you want to set a custom name
sage: class A(SageObject): ....: def _repr_(self): ....: return "Default repr" sage: a = A() Default repr sage: a.rename('I am not me') sage: a I am not me sage: a.rename(None) sage: a Default repr
There are two occurrences + one mention in the documentation.
Beyond that it looks good!
Vincent
comment:21 Changed 7 years ago by
- Commit changed from 2bc05906eb2db277bac427ec2ea55a47f95f1f26 to 723a55c71861db896335aa75763e8bba85a2a3b0
Branch pushed to git repo; I updated commit sha1. New commits:
723a55c | Replaced __repr__ by _repr_
|
comment:22 Changed 7 years ago by
- Status changed from needs_work to needs_review
Hello,
For Sage objects, you should use
_repr_
and not__repr__
.
Ok, good to know! Modification done.
comment:23 Changed 7 years ago by
- Status changed from needs_review to needs_work
- In each class/function/method the documentation should be:
- one line description
- then more details in one or several paragraphs if needed. It is not the case in
StaticErrorRateChannel
andErrorErasureChannel
.
Moreover, in a class you should not write as a description
Construct a X
. It is rather simplyX
. The reason is because it is the documentation of the class itself that you will see with:sage: Chan = channels.ErrorErasureChannel(GF(59)^40, 2, 2) sage: Chan?<enter>(BTW there is a nice short cut to build vector spaces:
K^n
just works!)
- You can simplify a bit the
ErrorErasureChannel._repr_
. By doing things in two partsdef format_interval(t): return str(t[0]) if t[0] == t[1] else 'between %s and %s' %(t[0], t[1]) def _repr_(): return "Error-and-erasure channel creating %s errors and %s erasures' %( format_interval(self.number_errors()), format_interval(self.number_erasures()))
And if you really care about pluralizationdef format_interval(t, singular, plural): name = singular if t[1] < 2 else plural if t[0] == t[1]: return '%s %s' %(t[0], name) else: return 'between %s and %s %s' %(t[0], t[1], name)
Suchformat_interval
can be used in all your_repr_
methods.
- You can simplify
ErrorErasureChannel.transmit_unsafe
. Notice that the output ofsample
is already randomized (you can have a look at the code source which is pure python) and it is specified in its documentationThe resulting list is in selection order so that all sub-slices will also be valid random samples
Hence, you can simply cut your errors into two parts aserrors = sample(xrange(n), number_errors + number_erasures) error_positions = errors[:number_errors] erasure_positions = errors[number_errors:]
comment:24 Changed 7 years ago by
- Commit changed from 723a55c71861db896335aa75763e8bba85a2a3b0 to 768ed9cd9e827fa833c9244cfade1ea63e703216
comment:25 Changed 7 years ago by
- Status changed from needs_work to needs_review
comment:26 Changed 7 years ago by
- Status changed from needs_review to needs_work
- In the string representation of channels the input space/output space does not appear
sage: n_err, n_era = 21, 21 sage: Chan = channels.ErrorErasureChannel(GF(59)^50, n_err, n_era) sage: Chan Error-and-erasure channel creating 21 errors and 21 erasures
i.e. no mention ofGF(59)^50
in the above example. Is that what you want?
- For the
ErrorErasureChannel
the output space isCartesianProduct
. For mysterious reason, it is adviced to usecartesian_product
fromsage.categories.cartesian_product
(see #15425).
- In
ErrorErasureChannel.transmit_unsafe
you would better replaceit is cleaner and potentially faster (not much).-
src/sage/coding/channel_constructions.py
diff --git a/src/sage/coding/channel_constructions.py b/src/sage/coding/channel_constructions.py index 0d32ffc..8935db7 100644
a b class ErrorErasureChannel(Channel): 592 592 593 zero = V.base_ring().zero() 593 594 for i in erasure_positions: 594 message[i] = 0595 message[i] = zero 595 596 return message, erasure_vector
-
- You should add examples that check that the input vector is not modified, ie
sage: v = my_funny_vector() sage: Chan(v) ... my_funny_output .... Check that the input ``v`` is not modified:: sage: v ... should be the initial vector ...
comment:27 Changed 7 years ago by
- Commit changed from 768ed9cd9e827fa833c9244cfade1ea63e703216 to 383e67c01119ce17f0ab58fd0833016a9bcdad01
comment:28 Changed 7 years ago by
- Status changed from needs_work to needs_review
For the
ErrorErasureChannel
the output space isCartesianProduct
. For mysterious reason, it is adviced to usecartesian_product
fromsage.categories.cartesian_product
(see #15425).
Thanks for the information.
I did the changes (and added input and output space in the representation methods for channels).
comment:29 Changed 7 years ago by
- Status changed from needs_review to positive_review
Salut David,
All right! This ticket is good to go.
That would be nice to start reviewing other's ticket as well. You can have a look at the section "Reviewer Checklist" in the Sage's Developer Guide. If you want to start slowly, there are some tickets that are marked "beginner". Basically: one of your ticket gets reviewed -> you review one ticket.
Best, Vincent
comment:30 follow-up: ↓ 31 Changed 7 years ago by
Hello,
Basically: one of your ticket gets reviewed -> you review one ticket.
Ok. I'll start reviewing then :)
comment:31 in reply to: ↑ 30 Changed 7 years ago by
Replying to dlucas:
Hello,
Basically: one of your ticket gets reviewed -> you review one ticket.
Ok. I'll start reviewing then :)
Cool :-) If you have doubts, hesitations, or anything do not hesitate to post a message to sage-devel@…
comment:32 Changed 7 years ago by
- Branch changed from u/dlucas/channels to 383e67c01119ce17f0ab58fd0833016a9bcdad01
- Resolution set to fixed
- Status changed from positive_review to closed
New commits:
Abstract class for channels
New channel: static error rate
Changed naming conventions and added new channel: ErrorErasure
Channels are now part of the reference manual for coding. Added imports.