Opened 9 years ago

Closed 9 years ago

#8765 closed enhancement (fixed)

Huffman Encoding

Reported by: ncohen Owned by: wdj
Priority: major Milestone: sage-4.4.4
Component: coding theory Keywords:
Cc: wdj, mvngu, leif Merged in: sage-4.4.4.alpha0
Authors: Nathann Cohen Reviewers: Minh Van Nguyen
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by mvngu)

This is a basic implementation of Huffman's encoding. May it be useful to teach ! :-)

Apply patches in this order:

  1. trac_8765-huffman.patch
  2. trac_8765-clean-ups.patch

Attachments (3)

trac_8765.patch (11.9 KB) - added by ncohen 9 years ago.
trac_8765-huffman.patch (12.6 KB) - added by mvngu 9 years ago.
new module sage.coding.source_coding.huffman
trac_8765-clean-ups.patch (23.3 KB) - added by mvngu 9 years ago.

Download all attachments as: .zip

Change History (25)

comment:1 Changed 9 years ago by leif

  • Cc leif added

comment:2 Changed 9 years ago by leif

Much room for extensions, e.g.: ;-)

  • accept (also) list of symbols of arbitrary alphabet (type should be checked anyway)
  • binary file (stream) I/O..., with or without encoding table
  • generate encoding/decoding functions

There's actually a possible application within Sage itself: compression of prime_pi and nth_prime tables. (The table-based functions are still work in progress.)

OT:

  • I could perhaps contribute FAX G3 en/decoding, too, if I'm able to find my dead old implementation... :)
  • Another nice thing is encoding and decoding of machine instructions (assembling, disassembling).

-Leif

comment:3 follow-up: Changed 9 years ago by ncohen

  • Status changed from new to needs_review

comment:4 Changed 9 years ago by ncohen

To think I just needed a function for Huffmann encoding yesterday to test something.. Sounds like this file will soon be out of my reach :-D

Nathann

comment:5 in reply to: ↑ 3 Changed 9 years ago by leif

  • Status changed from needs_review to needs_work

Replying to ncohen: Haven't tested yet, but there are at least some typos. (I could fix them, but most probably not today.)

I'd also add [more] type checks on the inputs.

comment:6 Changed 9 years ago by ylchapuy

Hi, I only had a very quick look and I'm a bit concerned about where to put this code in the library.

sage.coding is about channel coding (error correcting code) but this ticket is about source coding (data compression) and I think it's a bad idea to mix those in the library and in the doc.

I would feel a lot more confortable with sage.coding.source_coding.huffman.

Yann

comment:7 follow-up: Changed 9 years ago by ncohen

  • Status changed from needs_work to needs_info

Sounds sensible enough ! I just moved the file in a source_coding subdirectory, but I can not import sage.coding.source_coding.huffman, as sage does not find the source_coding module... Where should I tell it it exists ? :-)

Nathann

comment:8 in reply to: ↑ 7 Changed 9 years ago by wdj

Replying to ncohen:

Sounds sensible enough ! I just moved the file in a source_coding subdirectory, but I can not import sage.coding.source_coding.huffman, as sage does not find the source_coding module... Where should I tell it it exists ? :-)

Nathann

Maybe you could try to add an (empty) init.py file to it before adding huffman.py?

comment:9 follow-up: Changed 9 years ago by ncohen

Hmmm... I added this empty init.py file in source_coding/, but it made no difference... Is that what you meant ? :-/

Nathann

comment:10 in reply to: ↑ 9 Changed 9 years ago by mvngu

Replying to ncohen:

Hmmm... I added this empty init.py file in source_coding/, but it made no difference... Is that what you meant ? :-/

Try putting a comment in that __init__.py file. For example, the content of that init file might be:

# Just a comment so that __init__.py is not an empty file.

comment:11 Changed 9 years ago by ncohen

I just tried it (you were right, the file I created was empty), but noticed no difference... In the end, is it really worth creating a directory anyway ? I could just rename the file to source_coding.py and let this Huffman class stay inside ?..

But I still would like to understand how to get this directory detected :-/

anything to do in modules_list.py even though it is not a Cython file ?

Nathann

comment:12 Changed 9 years ago by ncohen

  • Status changed from needs_info to needs_review

What about this version (no directory, but a file source_coding.py ) ? :-)

comment:13 follow-up: Changed 9 years ago by wdj

This applies to 4.4.rc0 fine and passes sage -testall. I have not checked if the documentation builds okay. Are the other reviewers satisfied with the changes in this last patch?

comment:14 in reply to: ↑ 13 Changed 9 years ago by leif

  • Status changed from needs_review to needs_work

Replying to wdj:

This applies to 4.4.rc0 fine and passes sage -testall. I have not checked if the documentation builds okay.

We should do that before giving a positive review, I guess... :)

Are the other reviewers satisfied with the changes in this last patch?

There are still typos in the description:

  • s/occurence/occurrence/
  • s/frquency/frequency/
  • s/its its/its/ (two times)
  • s/occurencies/occurrences/ (two times)
  • s/eah/each/

encoding_table():

s/each letter/each trained letter/

__init__:

It's not tested if both string and frequencies are given (=> error).

And as I said before I'd prefer type checks on the parameters (rather than relying on automatically raised exceptions later in the code).

I also prefer having the return type documented at the top rather than (more or less) implicitly in the examples (e.g. in frequency_table(); tree() actually returns a DiGraph which happens to also be a tree.)

Doctests?

(Still haven't tested the code, just read the patches, sorry.)

-Leif

comment:15 Changed 9 years ago by ncohen

  • Status changed from needs_work to needs_review

Updated ! :-)

Nathann

Changed 9 years ago by ncohen

comment:16 Changed 9 years ago by leif

from operator import xor... ;-)

I'll test it this evening.

Changed 9 years ago by mvngu

new module sage.coding.source_coding.huffman

comment:17 follow-up: Changed 9 years ago by mvngu

  • Description modified (diff)

The patch trac_8765-huffman.patch replaces the earlier one trac_8765.patch. Now the Huffman module is in the directory source_coding/. Nothing has changed in the replacement patch, except for changes to get Sage to recognize this new module.

comment:18 in reply to: ↑ 17 Changed 9 years ago by leif

Replying to mvngu: I guess frequency_table should be imported in all.py as well.

Have to clone again and import the new patch... ;-)

-Leif

Changed 9 years ago by mvngu

comment:19 follow-up: Changed 9 years ago by mvngu

  • Description modified (diff)
  • Reviewers set to Minh Van Nguyen

Changes in the reviewer patch include:

  1. Add substantially more documentation to the module.
  2. Clean-ups in accordance with PEP 008.
  3. Don't use the module string nor its join function. These have been deprecated. You should now use str and the function "".join.
  4. Get the whole module to 100% coverage.

This means I have reviewed trac_8765-huffman.patch, so only my patch needs review by anyone but me.

comment:20 in reply to: ↑ 19 Changed 9 years ago by leif

Replying to mvngu:

This means I have reviewed trac_8765-huffman.patch, so only my patch needs review by anyone but me.

That patch looks very good. (I wished all module documentations had that quality.)

I'll only add some more doctests and perhaps edit some comments.

I think improvements to the algorithm can be done on another ticket.

-Leif

comment:21 Changed 9 years ago by ncohen

  • Status changed from needs_review to positive_review

The only reason why I still read your patches, apply them, check they are correct, check the docstrings, build the documentation and read it, is that I fear some God may be watching over my shoulder.

Another perfect patch from Minh :-D

Thank you very much !

Nathann

comment:22 Changed 9 years ago by mhansen

  • Merged in set to sage-4.4.4.alpha0
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.