Opened 10 years ago
Closed 10 years ago
#10864 closed defect (fixed)
Bug in Huffman algorithm
Reported by: | jsrn | Owned by: | wdj |
---|---|---|---|
Priority: | major | Milestone: | sage-4.7 |
Component: | coding theory | Keywords: | huffman |
Cc: | ncohen, mvngu | Merged in: | sage-4.7.alpha3 |
Authors: | Nathann Cohen | Reviewers: | Johan Sebastian Rosenkilde Nielsen |
Report Upstream: | N/A | Work issues: | |
Branch: | Commit: | ||
Dependencies: | Stopgaps: |
Description (last modified by )
There seems to be a bug in the Huffman build algorithm when given a frequency dictionary.
The following example results in a wrong encoding table: Let there be 10 symbols numbered 1,..,10 where number i occurs with probability i/55. The Huffman table I end up with, manually running the algorithm is something like the following:
{ 1: '00000', 2: '00001', 3: '0001', 4: '1000', 5: '1001', 6: '001', 7: '110', 8: '111', 9: '101', 10: '01' }
which has expected length 173/55.
The current Huffman-algorithm returns
{1: '1100', 2: '1101', 3: '1110', 4: '1111', 5: '000', 6: '001', 7: '010', 8: '011', 9: '100', 10: '101'}
which has expected length 175/55.
Apply :
- trac_10864.patch
- trac_10864-2.patch
Attachments (2)
Change History (21)
comment:1 Changed 10 years ago by
- Milestone set to sage-4.6.2
- Status changed from new to needs_review
comment:2 Changed 10 years ago by
comment:3 Changed 10 years ago by
- Cc mvngu added
comment:4 Changed 10 years ago by
- Milestone changed from sage-4.6.2 to sage-4.7
comment:5 Changed 10 years ago by
Oh.. Thanks ! :-)
comment:6 follow-up: ↓ 8 Changed 10 years ago by
Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary. And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.
comment:7 Changed 10 years ago by
Ok, I see that I used the class in the wrong way :-)
Wouldn't it be more user-friendly to change the Huffman constructor to accept only one unnamed argument, and then treat it according to type, e.g:
def __init__(self, source): if isinstance(source, string): # init by string elif isinstance(source, dict): # init by dict else: # give the user an error
(Because of deprecation policy, I guess we would need to support the current interface in some way, which would make the code a bit messy, but that will only be for a year)
comment:8 in reply to: ↑ 6 ; follow-up: ↓ 9 Changed 10 years ago by
Hello !
Replying to dsm:
Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary.
Right. I just thought : "Let's feed it anything but a string"
And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.
I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?
Nathann
comment:9 in reply to: ↑ 8 Changed 10 years ago by
Replying to ncohen:
I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?
I think the usual idiom in python 2 is "isinstance(s, basestring)" to allow both.
comment:10 Changed 10 years ago by
Hello !!
This is an updated patch in which string has been replaced by basename, and an unique argument is expected by the constructor.
This class being pretty new and not really advertised, perhaps we can do without backward compatibility for once ?.. :-p
I mean, this class is perhaps only useful to illustrate what the Huffman algorithm is (slow encoding in Python..), and updating a possibly uncompatible (if it exists somewhere in the world) should not take more than a few seconds :-)
Nathann
Changed 10 years ago by
comment:11 Changed 10 years ago by
Nice patch. I agree with sidestepping the deprecation policy :-) I'll try and review once my 4.6.2 finish compiling :-P
Johan
comment:12 follow-up: ↓ 13 Changed 10 years ago by
- Status changed from needs_review to positive_review
Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.
comment:13 in reply to: ↑ 12 Changed 10 years ago by
- Status changed from positive_review to needs_work
Replying to jsrn:
Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.
I object to the positive review. Note that the documentation for the class Huffman
is out of sync with the definition of that class. For instance, the documentation still refers to the deleted argument table
. Sorry for being nasty, but this ticket goes to "needs_work".
comment:14 Changed 10 years ago by
- Status changed from needs_work to needs_review
Argggggg !!! Sorry about that !!!
I had been looking for occurrences of table =
in the code to replace them by
source
or remove them, and I forgot some occurrences of
table
in the documentation.... I hope there is none left
:-)
Nathann
comment:15 Changed 10 years ago by
- Description modified (diff)
Changed 10 years ago by
comment:16 follow-up: ↓ 17 Changed 10 years ago by
- Status changed from needs_review to positive_review
Woops, sorry about that. Good thing you're awake Minh :-)
I reread the whole thing with the new patch, and retested, rebuilt and redoc'ed. It seems alright now.
comment:17 in reply to: ↑ 16 Changed 10 years ago by
Replying to jsrn:
Woops, sorry about that. Good thing you're awake Minh :-)
Minh is always awake. Minh does not rest, and sees everything. Minh is like a better version of Chuck Nurris invented by Chuck Nurris O_O
Nathann
comment:18 Changed 10 years ago by
- Reviewers set to Johan Sebastian Rosenkilde Nielsen
comment:19 Changed 10 years ago by
- Merged in set to sage-4.7.alpha3
- Resolution set to fixed
- Status changed from positive_review to closed
This happens when Sage is fed with a dictionnary where it expects a string.
This would take half a second to notice if only Sage was returning an error rather than work on a dictionary instead of a string. This patch should avoid this problem in the future
:-)
Nathann