Opened 6 years ago

Closed 6 years ago

#18269 closed enhancement (fixed)

A new structure for experimentation on decoding: communication channels

Reported by: dlucas Owned by:
Priority: major Milestone: sage-6.7
Component: coding theory Keywords:
Cc: jsrn Merged in:
Authors: David Lucas Reviewers: Vincent Delecroix
Report Upstream: N/A Work issues:
Branch: 383e67c (Commits, GitHub, GitLab) Commit: 383e67c01119ce17f0ab58fd0833016a9bcdad01
Dependencies: Stopgaps:

Status badges

Description

For now, there is no structure to easily add errors in codewords. If one wants to experiment with decoding algorithms on codes, the only possible way is to add errors "by hand", which is rather tedious.

We propose here a new structure, based on communication channels, on purpose to facilitate the experimentation process with decoding algorithms.

For now, our structure consists of:

  • an abstract class for channels,
  • a channel which adds a specific number of errors at random positions to each provided vector
  • a channel which adds a specific number of errors at random positions to each provided vector, and erases a specific number of random positions

With this new structure, creating n errors in a vector can be done in one line of code into Sage. Adding a new Channel class should also be easy: all one needs to do is to inherit from the abstract class, and override and implement a method.

For better consistency, channels can only be accessed using channels.<name> (see #15445) from the global namespace.

Change History (32)

comment:1 Changed 6 years ago by dlucas

  • Branch set to u/dlucas/channels

comment:2 Changed 6 years ago by dlucas

  • Commit set to 62a2447c45d3d2878f2da037ea4721464bef2ea4
  • Status changed from new to needs_review

New commits:

3f66f35Abstract class for channels
df9fd2cNew channel: static error rate
4da3b35Changed naming conventions and added new channel: ErrorErasure
62a2447Channels are now part of the reference manual for coding. Added imports.

comment:3 Changed 6 years ago by vdelecroix

  • Status changed from needs_review to needs_work

Hello,

  1. You might consider inheriting from SageObject (sage.structure.sage_object).
  1. I would rather hide them under codes.channels.<name>. That way you can avoid the file channels_catalog.
  1. Why sage.coding.channel_constructions and not sage.coding.channels?
  1. Why not setting __call__ as an alias of transmit. That would allow
    sage: channel(msg)
    
    which is convenient.
  1. Would be cool to have cross references with actual codes in the documentation. Right now the documentation looks like this is an isolated block.
  1. You could add various tests to check that if you reset the seed you get the same answer. Might be useful to have this doctested and also for users to know that they can use it.
    sage: set_random_seed(10)
    sage: randint(0,5)
    5
    sage: set_random_seed(10)
    sage: randint(0,5)
    5
    sage: set_random_seed(10)
    sage: randint(0,5)
    5
    
  1. In _random_error_vector why not starting with
    vect = [F.random_element() for _ in range(n)]
    
  1. The functions _random_error_position, _random_error_vector and _tuple_to_integer will not appear in the documentation. You should either remove the _ at the begining or add some automethod sphinx directives (you have to look in the code to find how to do it)
  1. The function _random_error_position should be moved to prandom. It is useful in its own and quite independent. Note that it is quite similar to
    sage: sample(range(10), 4)
    [1, 7, 6, 4]
    
  1. For _random_error_vector I would rather use F._nonzero_random_element() that is useful in your case
    sage: GF(2)._random_nonzero_element()
    1
    
    of course the default implementation is to call .random_element until something nonzero is found. Hence you can use
  1. Why do you need this _tuple_to_integer? Why not using directly randint(*value)?
    sage: value = [0,5]
    sage: randint(*value)
    3
    sage: value = (0,5)
    sage: randint(*value)
    4
    

Eleven is a nice prime number. So I will stop there.

Vincent

comment:4 Changed 6 years ago by dlucas

Hello,

Thank you for the input!

I would rather hide them under codes.channels.<name>. That way you can avoid the file channels_catalog. Why sage.coding.channel_constructions and not sage.coding.channels?

My idea here was to mimic the way codes were stored and hide: a code_constructions file for the codes, and a codes_catalog file for the list of codes. I wanted to do the same with channels for consistency. That's why I chose the name channel_constructions. Hiding the channels under codes.channels.<name> might be confusing (imho): one can think that channels depends on codes, which is not the case as they are completely independant objects. So they can be used on any vector, not only on codewords, or for experimentation on decoding. Which also explains why the documentation looks like an isolated block, I wrote it as a new object, independant from codes.

The functions _random_error_position, _random_error_vector and _tuple_to_integer will not appear in the documentation. You should either remove the _ at the begining or add some automethod sphinx directives (you have to look in the code to find how to do it)

Well, that was done on purpose as one should not use these methods except when implementing a new Channel object. Now, with a second thought about this, it's not that smart as one will have to browse through the code to find these methods. I think I'll remove the underscore to make them appear in the doc, while putting the statement "This is a helper function, for internal use only." in a ..NOTE:: block to highlight this.

For the rest of your comments, thanks a lot for the advice on these random issues. I'll change the code accordingly.

David

comment:5 Changed 6 years ago by git

  • Commit changed from 62a2447c45d3d2878f2da037ea4721464bef2ea4 to 874deac06298703bc9cd9d8354302830739d6089

Branch pushed to git repo; I updated commit sha1. New commits:

934f6c5Moved random_error_position to prandom
5ee95d7Changed code of random_error_vector
874deacAbstractChannel now inherits from SageObject. Added an alias to transmit method.

comment:6 Changed 6 years ago by vdelecroix

random_error_position is not an appropriate name. Perhaps range_sample or sample_range. The description is not appropriate as well. Should be something like

def sample_range(n, k):
    """
    Return a subset of [0,...,n-1] of size k with uniform probability.

    INPUT:

    - ``n``, ``k`` - integers

    OUTPUT:

    An ordered list.   

    AUTHORS:

    This function is taken from codinglib (https://bitbucket.org/jsrn/codinglib/)
    written by Johan Nielsen.
"""

In the same function you should also:

  • check the input (just do n = int(n); k = int(k))
  • rename the variable error_position into something more meaningful

And for the line sage:set_random_seed(10) there must be a space sage: set_random_seed(10).

comment:7 Changed 6 years ago by dlucas

I did some changes, as stated by commit messages above.

Some remarks:

  • I kept tuple_to_integer because the user can pass either an integer or an interval as number_errors parameter for the channels. If the user passes an integer, then randint won't work. Of course it is possible to do the check anytime you call number_errors (if integer, return it, else do a randint and return the result) but it's rather tedious. Especially because it's something that will be used by a some channels. I think it's easier to call a single method rather than copying the if statement each time. Maybe tuple_to_integer is not really a good name though.
  • random_error_vector(n, F, error_positions) returns a vector of length n over F with 0s in every position except those listed in error_positions. Because of this, calling vect = [F.random_element() for _ in range(n)] in random_error_vector does not really fit, because it will create a vector with non-controlled zero positions, so we'll need to set all positions except those into error_positions afterwards. I think create a zero vector of size n and add random values at some positions is better.
  • I did not change the documentation, nor the catalog stuff (see comment 4). If you think that Channels are actually more linked to codes than to anything else, I can make some changes.

David

comment:8 Changed 6 years ago by git

  • Commit changed from 874deac06298703bc9cd9d8354302830739d6089 to 5f31c5cef52b9b0bdbd905f4ff132e344446e766

Branch pushed to git repo; I updated commit sha1. New commits:

5f31c5cChanged name of random_error_position to sample_range, updated documentation, propaged naiming change. Removed some whitespaces

comment:9 Changed 6 years ago by dlucas

  • Status changed from needs_work to needs_info

New commits:

5f31c5cChanged name of random_error_position to sample_range, updated documentation, propaged naiming change. Removed some whitespaces

comment:10 follow-up: Changed 6 years ago by vdelecroix

Replying to dlucas:

I did some changes, as stated by commit messages above.

Some remarks:

  • I kept tuple_to_integer because the user can pass either an integer or an

interval as number_errors parameter for the channels. If the user passes an integer, then randint won't work. Of course it is possible to do the check anytime you call number_errors (if integer, return it, else do a randint and return the result) but it's rather tedious. Especially because it's something that will be used by a some channels. I think it's easier to call a single method rather than copying the if statement each time. Maybe tuple_to_integer is not really a good name though.

The name is very bad. Moreover you can convert an integer a into the tuple (a,a) in the constructor. With your way of doing it you avoid the function call to randint but with your version you always add an extra function call to tuple_to_integer.

You should replace all occurrence of hasattr(X, '__iter__') by isinstance(X, (tuple, list)). Having an __iter__ method does not charaterize tuples.

  • random_error_vector(n, F, error_positions) returns a vector of length n

over F with 0s in every position except those listed in error_positions. Because of this, calling `vect = [F.random_element() for _ in range(n)] in random_error_vector` does not really fit, because it will create a vector with non-controlled zero positions, so we'll need to set all positions except those into error_positions afterwards. I think create a zero vector of size n and add random values at some positions is better.

Yes. You are right!

  • I did not change the documentation, nor the catalog stuff (see comment 4).

If you think that Channels are actually more linked to codes than to anything else, I can make some changes

You should tell me ;-) Cryptogtaphers might like it but they do use strings not vectors in a vector space over a finite field.

  1. When you do __call__ = transmit the documentation is copied as well. Did you try
    sage: Chan = channels.StaticErrorRateChannel(GF(11), 2)
    sage: Chan.__call__?
    
  1. I would rather implement StaticErrorRateChannel.transmit_unsafe as
    def transmit_unsafe(self, msg):
        w = msg.copy()
        V = self.input_space()
        for i in sample_range(V.dim(), randint(*self.nb_errors())):
            w[i] = V.random_element()
        return w 
    
    it would be more direct. Moreover it shows that sample_range would better be an iterator instead of returning a list. By the way, this function sample_range has currently linear complexity in n, isn't it possible to make it linear in k?
  1. In the main documentation class, it is useful to document the main methods. In other words, when reading sage: channels.StaticErrorChannel? I should be able to discvoer the method transmit (and its alias usage with __call__).
  1. In the very top documentation, you can have links by writing class:`AbstractChannel` instead of *AbstractChannel*.

Vincent

comment:11 Changed 6 years ago by vdelecroix

  • Status changed from needs_info to needs_work

comment:12 in reply to: ↑ 10 ; follow-up: Changed 6 years ago by vdelecroix

  1. I would rather implement StaticErrorRateChannel.transmit_unsafe as
    def transmit_unsafe(self, msg):
        w = msg.copy()
        V = self.input_space()
        for i in sample_range(V.dim(), randint(*self.nb_errors())):
            w[i] = V.random_element()
        return w 
    
    it would be more direct. Moreover it shows that sample_range would better be an iterator instead of returning a list. By the way, this function sample_range has currently linear complexity in n, isn't it possible to make it linear in k?

Yes it is (the solution is not exactly linear though)

sage: S = Subsets(100,4)
sage: timeit("S.random_element()")
625 loops, best of 3: 54.2 µs per loop
sage: timeit("sample_range(100,4)")
625 loops, best of 3: 57.7 µs per loop
sage: S = Subsets(1000,4)
sage: timeit("S.random_element()")
625 loops, best of 3: 61.2 µs per loop
sage: timeit("sample_range(1000,4)")
625 loops, best of 3: 510 µs per loop

It is roughly the same speed for n=100 but for n=1000 yours is twice slower.

comment:13 in reply to: ↑ 12 Changed 6 years ago by vdelecroix

Replying to vdelecroix:

Moreover it shows that sample_range would better be an iterator instead of returning a list. By the way, this function sample_range has currently linear complexity in n, isn't it possible to make it linear in k?

Yes it is (the solution is not exactly linear though) ... It is roughly the same speed for n=100 but for n=1000 yours is twice slower.

This code for subsets is just using

sage: sample(range(n), k)

So I guess this is currently the best way to do it. And moreover, looking at Python doc

    To choose a sample in a range of integers, use xrange as an argument.
    This is especially fast and space efficient for sampling from a
    large population:   sample(xrange(10000000), 60)

Sorry for my previous remarks. You should just get rid of sample_range(n,k) and use sample(xrange(n),k).

Vincent

comment:14 Changed 6 years ago by git

  • Commit changed from 5f31c5cef52b9b0bdbd905f4ff132e344446e766 to f93852e00d00dd694bed6c068d87cd17ad1a5d3c

Branch pushed to git repo; I updated commit sha1. New commits:

f515e4dRemoved __call__ = transmit
87667ebRemoved sample_range method, propaged changes and changed code for transmit_unsafe in StaticErrorRateChannel
f9b07beRemoved tuple_to_integer method
f93852eImproved documentation

comment:15 Changed 6 years ago by dlucas

  • Status changed from needs_work to needs_review

Thank you again for the advice!

The only thing I did not do is to add cross-references between channels and codes. I think it's not the best time to do it, as there is no proper Encoder/Decoder? structure. But when we will propose our structure for decoding, I think it will be interesting to illustrate the error correction using channels to create errors. Before that, I let it as is.

comment:16 Changed 6 years ago by vdelecroix

  • Status changed from needs_review to needs_work

Hello,

  1. I do not like this AbstractChannel. If you look Morphism the top (abstract) class is not AbstractMorphism but Morphism (see sage.categories.morphism.Morphism).
  1. In my remark 12 I was suggesting to actually remove
    def __call__(self, msg):
        return self.transmit(msg)
    
    and keep only
    __call__ = transmit
    
    It avoids duplication of the documentation.
  1. You can remove the raise NotImplementedError in the abstract method. By construction
    sage: class A:
    ....:     @abstract_method
    ....:     def ploum_ploum(self):
    ....:         "a hello function!"
    sage: a = A()
    sage: a.ploum_ploum()
    Traceback (most recent call last):
    ...
    NotImplementedError: <abstract method ploum_ploum at 0x7f1db2e02758>
    
  1. In
    ValueError: The total number of errors and erasures can
    exceed the dimension of the input space
    
    shouldn't it be can not?
  1. Do you really care about equality (i.e. the method __eq__)? If you do not then remove the method. If you do then you must also implement __ne__ (and possibly __hash__). Otherwise
    sage: class A:
    ....:     def __eq__(self, other):
    ....:         return True
    sage: a = A()
    sage: b = A()
    sage: a == b
    True
    sage: a != b
    True
    

Further remarks for later:

  • If you look at the table of contents of coding you have
    - Index of Channels
    - Channels
    - Index of Codes
    - Linear Codes
    ...
    
    Not very nice. Though not for this ticket. Think about it for later on the table of contents of code documentation is a mess! But of course the priority is to clean the code.
  • A more global remark: it is generally a bad strategy to implement an isolated block of code and then to try to mix things later. Especially if the code in itself is basically useless.

Vincent

comment:17 Changed 6 years ago by git

  • Commit changed from f93852e00d00dd694bed6c068d87cd17ad1a5d3c to 2bc05906eb2db277bac427ec2ea55a47f95f1f26

Branch pushed to git repo; I updated commit sha1. New commits:

7479f9fRenamed AbstractChannel as Channel, removed __call__ as a defined method to avoid duplication in the documentation
2bc0590Small changes to documentation. Removed __eq__ methods

comment:18 follow-up: Changed 6 years ago by dlucas

  • Status changed from needs_work to needs_review

If you look at the table of contents of coding you have [...] Not very nice. Though not for this ticket. Think about it for later on the table of contents of code documentation is a mess! But of course the priority is to clean the code.

I agree. It's definitely something to keep in mind.

A more global remark: it is generally a bad strategy to implement an isolated block of code and then to try to mix things later. Especially if the code in itself is basically useless.

Got it!

I did the suggested changes in the code. I removed __eq__, as I do not really picture a case in which one would need to compare two channels.

David

comment:19 in reply to: ↑ 18 Changed 6 years ago by vdelecroix

Replying to dlucas:

I removed __eq__, as I do not really picture a case in which one would need to compare two channels.

Hence my question: it will really depend on its usage!

Vincent

comment:20 Changed 6 years ago by vdelecroix

  • Authors set to David Lucas
  • Reviewers set to Vincent Delecroix
  • Status changed from needs_review to needs_work

Hello,

For Sage objects, you should use _repr_ and not __repr__. It is useful for example if you want to set a custom name

sage: class A(SageObject):
....:     def _repr_(self):
....:         return "Default repr"
sage: a = A()
Default repr
sage: a.rename('I am not me')
sage: a
I am not me
sage: a.rename(None)
sage: a
Default repr

There are two occurrences + one mention in the documentation.

Beyond that it looks good!

Vincent

comment:21 Changed 6 years ago by git

  • Commit changed from 2bc05906eb2db277bac427ec2ea55a47f95f1f26 to 723a55c71861db896335aa75763e8bba85a2a3b0

Branch pushed to git repo; I updated commit sha1. New commits:

723a55cReplaced __repr__ by _repr_

comment:22 Changed 6 years ago by dlucas

  • Status changed from needs_work to needs_review

Hello,

For Sage objects, you should use _repr_ and not __repr__.

Ok, good to know! Modification done.

comment:23 Changed 6 years ago by vdelecroix

  • Status changed from needs_review to needs_work
  1. In each class/function/method the documentation should be:
    • one line description
    • then more details in one or several paragraphs if needed. It is not the case in StaticErrorRateChannel and ErrorErasureChannel.

Moreover, in a class you should not write as a description Construct a X. It is rather simply X. The reason is because it is the documentation of the class itself that you will see with:

sage: Chan = channels.ErrorErasureChannel(GF(59)^40, 2, 2)
sage: Chan?<enter>

(BTW there is a nice short cut to build vector spaces: K^n just works!)

  1. You can simplify a bit the ErrorErasureChannel._repr_. By doing things in two parts
    def format_interval(t):
        return str(t[0]) if t[0] == t[1] else 'between %s and %s' %(t[0], t[1])
    
    def _repr_():
        return "Error-and-erasure channel creating %s errors and %s erasures' %(
                  format_interval(self.number_errors()),
                  format_interval(self.number_erasures()))
    
    And if you really care about pluralization
    def format_interval(t, singular, plural):
        name = singular if t[1] < 2 else plural
        if t[0] == t[1]:
            return '%s %s' %(t[0], name)
        else:
            return 'between %s and %s %s' %(t[0], t[1], name)
    
    Such format_interval can be used in all your _repr_ methods.
  1. You can simplify ErrorErasureChannel.transmit_unsafe. Notice that the output of sample is already randomized (you can have a look at the code source which is pure python) and it is specified in its documentation
    The resulting list is in selection order so that
    all sub-slices will also be valid random samples
    
    Hence, you can simply cut your errors into two parts as
    errors = sample(xrange(n), number_errors + number_erasures)
    error_positions   = errors[:number_errors]
    erasure_positions = errors[number_errors:]
    

comment:24 Changed 6 years ago by git

  • Commit changed from 723a55c71861db896335aa75763e8bba85a2a3b0 to 768ed9cd9e827fa833c9244cfade1ea63e703216

Branch pushed to git repo; I updated commit sha1. New commits:

23dd71cSimplified _repr_ and _latex_ methods using a new helper function
cb7b137Some changes to the documentation
768ed9cSimplified code in ErrorErasureChannel.transmit_unsafe

comment:25 Changed 6 years ago by dlucas

  • Status changed from needs_work to needs_review

comment:26 Changed 6 years ago by vdelecroix

  • Status changed from needs_review to needs_work
  1. In the string representation of channels the input space/output space does not appear
    sage: n_err, n_era = 21, 21
    sage: Chan = channels.ErrorErasureChannel(GF(59)^50, n_err, n_era)
    sage: Chan
    Error-and-erasure channel creating 21 errors and 21 erasures
    
    i.e. no mention of GF(59)^50 in the above example. Is that what you want?
  1. For the ErrorErasureChannel the output space is CartesianProduct. For mysterious reason, it is adviced to use cartesian_product from sage.categories.cartesian_product (see #15425).
  1. In ErrorErasureChannel.transmit_unsafe you would better replace
    • src/sage/coding/channel_constructions.py

      diff --git a/src/sage/coding/channel_constructions.py b/src/sage/coding/channel_constructions.py
      index 0d32ffc..8935db7 100644
      a b class ErrorErasureChannel(Channel): 
      592592
       593        zero = V.base_ring().zero()
      593594        for i in erasure_positions:
      594             message[i] = 0
       595            message[i] = zero
      595596        return message, erasure_vector
    it is cleaner and potentially faster (not much).
  1. You should add examples that check that the input vector is not modified, ie
        sage: v = my_funny_vector()
        sage: Chan(v)
        ... my_funny_output ....
    
    Check that the input ``v`` is not modified::
    
        sage: v
        ... should be the initial vector ...
    
Last edited 6 years ago by vdelecroix (previous) (diff)

comment:27 Changed 6 years ago by git

  • Commit changed from 768ed9cd9e827fa833c9244cfade1ea63e703216 to 383e67c01119ce17f0ab58fd0833016a9bcdad01

Branch pushed to git repo; I updated commit sha1. New commits:

c3b5555Replaced CartesianProduct by cartesian_product
6dd51f3New test in transmit method. Replaced 0 by the zero of the base ring of the input space in ErrorErasureChannel.transmit_unsafe
383e67cChanged _repr_ and _latex_ methods

comment:28 Changed 6 years ago by dlucas

  • Status changed from needs_work to needs_review

For the ErrorErasureChannel the output space is CartesianProduct. For mysterious reason, it is adviced to use cartesian_product from sage.categories.cartesian_product (see #15425).

Thanks for the information.

I did the changes (and added input and output space in the representation methods for channels).

comment:29 Changed 6 years ago by vdelecroix

  • Status changed from needs_review to positive_review

Salut David,

All right! This ticket is good to go.

That would be nice to start reviewing other's ticket as well. You can have a look at the section "Reviewer Checklist" in the Sage's Developer Guide. If you want to start slowly, there are some tickets that are marked "beginner". Basically: one of your ticket gets reviewed -> you review one ticket.

Best, Vincent

comment:30 follow-up: Changed 6 years ago by dlucas

Hello,

Basically: one of your ticket gets reviewed -> you review one ticket.

Ok. I'll start reviewing then :)

comment:31 in reply to: ↑ 30 Changed 6 years ago by vdelecroix

Replying to dlucas:

Hello,

Basically: one of your ticket gets reviewed -> you review one ticket.

Ok. I'll start reviewing then :)

Cool :-) If you have doubts, hesitations, or anything do not hesitate to post a message to sage-devel@…

comment:32 Changed 6 years ago by vbraun

  • Branch changed from u/dlucas/channels to 383e67c01119ce17f0ab58fd0833016a9bcdad01
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.