Opened 8 years ago

Closed 8 years ago

#10864 closed defect (fixed)

Bug in Huffman algorithm

Reported by: jsrn Owned by: wdj
Priority: major Milestone: sage-4.7
Component: coding theory Keywords: huffman
Cc: ncohen, mvngu Merged in: sage-4.7.alpha3
Authors: Nathann Cohen Reviewers: Johan Sebastian Rosenkilde Nielsen
Report Upstream: N/A Work issues:
Branch: Commit:
Dependencies: Stopgaps:

Description (last modified by ncohen)

There seems to be a bug in the Huffman build algorithm when given a frequency dictionary.

The following example results in a wrong encoding table: Let there be 10 symbols numbered 1,..,10 where number i occurs with probability i/55. The Huffman table I end up with, manually running the algorithm is something like the following:

{ 1: '00000', 2: '00001', 3: '0001', 4: '1000', 5: '1001',
  6: '001', 7: '110', 8: '111', 9: '101', 10: '01' }

which has expected length 173/55.

The current Huffman-algorithm returns

{1: '1100', 2: '1101', 3: '1110', 4: '1111', 5: '000',
 6: '001', 7: '010', 8: '011', 9: '100', 10: '101'}

which has expected length 175/55.

Apply :

  • trac_10864.patch
  • trac_10864-2.patch

Attachments (2)

trac_10864.patch (5.8 KB) - added by ncohen 8 years ago.
trac_10864-2.patch (1.3 KB) - added by ncohen 8 years ago.

Download all attachments as: .zip

Change History (21)

comment:1 Changed 8 years ago by ncohen

  • Milestone set to sage-4.6.2
  • Status changed from new to needs_review

This happens when Sage is fed with a dictionnary where it expects a string.

sage: freq = dict([(i,i) for i in range(1,11)])
sage: freq
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10}

sage: Huffman(freq).encoding_table()
{1: '1100', 2: '1101', 3: '1110', 4: '1111', 5: '000', 6: '001', 7: '010', 8: '011', 9: '100', 10: '101'}

sage: Huffman(table = freq).encoding_table()
{1: '01110', 2: '01111', 3: '0110', 4: '1110', 5: '1111', 6: '010', 7: '100', 8: '101', 9: '110', 10: '00'}
sage: enc = Huffman(table = freq).encoding_table()
sage: sum(map(lambda (x,y): x*len(y),enc.items()))
173

This would take half a second to notice if only Sage was returning an error rather than work on a dictionary instead of a string. This patch should avoid this problem in the future :-)

Nathann

comment:2 Changed 8 years ago by ncohen

  • Authors set to Nathann Cohen

comment:3 Changed 8 years ago by mvngu

  • Cc mvngu added

comment:4 Changed 8 years ago by mvngu

  • Milestone changed from sage-4.6.2 to sage-4.7

comment:5 Changed 8 years ago by ncohen

Oh.. Thanks ! :-)

comment:6 follow-up: Changed 8 years ago by dsm

Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary. And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.

comment:7 Changed 8 years ago by jsrn

Ok, I see that I used the class in the wrong way :-)

Wouldn't it be more user-friendly to change the Huffman constructor to accept only one unnamed argument, and then treat it according to type, e.g:

def __init__(self, source):
  if isinstance(source, string):
     # init by string
  elif isinstance(source, dict):
     # init by dict
  else:
     # give the user an error

(Because of deprecation policy, I guess we would need to support the current interface in some way, which would make the code a bit messy, but that will only be for a year)

comment:8 in reply to: ↑ 6 ; follow-up: Changed 8 years ago by ncohen

Hello !

Replying to dsm:

Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary.

Right. I just thought : "Let's feed it anything but a string"

And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.

I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?

Nathann

comment:9 in reply to: ↑ 8 Changed 8 years ago by dsm

Replying to ncohen:

I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?

I think the usual idiom in python 2 is "isinstance(s, basestring)" to allow both.

comment:10 Changed 8 years ago by ncohen

Hello !!

This is an updated patch in which string has been replaced by basename, and an unique argument is expected by the constructor.

This class being pretty new and not really advertised, perhaps we can do without backward compatibility for once ?.. :-p

I mean, this class is perhaps only useful to illustrate what the Huffman algorithm is (slow encoding in Python..), and updating a possibly uncompatible (if it exists somewhere in the world) should not take more than a few seconds :-)

Nathann

Changed 8 years ago by ncohen

comment:11 Changed 8 years ago by jsrn

Nice patch. I agree with sidestepping the deprecation policy :-) I'll try and review once my 4.6.2 finish compiling :-P

Johan

comment:12 follow-up: Changed 8 years ago by jsrn

  • Status changed from needs_review to positive_review

Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.

comment:13 in reply to: ↑ 12 Changed 8 years ago by mvngu

  • Status changed from positive_review to needs_work

Replying to jsrn:

Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.

I object to the positive review. Note that the documentation for the class Huffman is out of sync with the definition of that class. For instance, the documentation still refers to the deleted argument table. Sorry for being nasty, but this ticket goes to "needs_work".

comment:14 Changed 8 years ago by ncohen

  • Status changed from needs_work to needs_review

Argggggg !!! Sorry about that !!!

I had been looking for occurrences of table = in the code to replace them by source or remove them, and I forgot some occurrences of table in the documentation.... I hope there is none left :-)

Nathann

comment:15 Changed 8 years ago by ncohen

  • Description modified (diff)

Changed 8 years ago by ncohen

comment:16 follow-up: Changed 8 years ago by jsrn

  • Status changed from needs_review to positive_review

Woops, sorry about that. Good thing you're awake Minh :-)

I reread the whole thing with the new patch, and retested, rebuilt and redoc'ed. It seems alright now.

comment:17 in reply to: ↑ 16 Changed 8 years ago by ncohen

Replying to jsrn:

Woops, sorry about that. Good thing you're awake Minh :-)

Minh is always awake. Minh does not rest, and sees everything. Minh is like a better version of Chuck Nurris invented by Chuck Nurris O_O

Nathann

comment:18 Changed 8 years ago by jdemeyer

  • Reviewers set to Johan Sebastian Rosenkilde Nielsen

comment:19 Changed 8 years ago by jdemeyer

  • Merged in set to sage-4.7.alpha3
  • Resolution set to fixed
  • Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.