Opened 10 years ago

Closed 10 years ago

# Bug in Huffman algorithm

Reported by: Owned by: jsrn wdj major sage-4.7 coding theory huffman ncohen, mvngu sage-4.7.alpha3 Nathann Cohen Johan Sebastian Rosenkilde Nielsen N/A

There seems to be a bug in the Huffman build algorithm when given a frequency dictionary.

The following example results in a wrong encoding table: Let there be 10 symbols numbered 1,..,10 where number i occurs with probability i/55. The Huffman table I end up with, manually running the algorithm is something like the following:

```{ 1: '00000', 2: '00001', 3: '0001', 4: '1000', 5: '1001',
6: '001', 7: '110', 8: '111', 9: '101', 10: '01' }
```

which has expected length 173/55.

The current Huffman-algorithm returns

```{1: '1100', 2: '1101', 3: '1110', 4: '1111', 5: '000',
6: '001', 7: '010', 8: '011', 9: '100', 10: '101'}
```

which has expected length 175/55.

Apply :

• trac_10864.patch
• trac_10864-2.patch

### comment:1 Changed 10 years ago by ncohen

• Milestone set to sage-4.6.2
• Status changed from new to needs_review

This happens when Sage is fed with a dictionnary where it expects a string.

```sage: freq = dict([(i,i) for i in range(1,11)])
sage: freq
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10}

sage: Huffman(freq).encoding_table()
{1: '1100', 2: '1101', 3: '1110', 4: '1111', 5: '000', 6: '001', 7: '010', 8: '011', 9: '100', 10: '101'}

sage: Huffman(table = freq).encoding_table()
{1: '01110', 2: '01111', 3: '0110', 4: '1110', 5: '1111', 6: '010', 7: '100', 8: '101', 9: '110', 10: '00'}
sage: enc = Huffman(table = freq).encoding_table()
sage: sum(map(lambda (x,y): x*len(y),enc.items()))
173
```

This would take half a second to notice if only Sage was returning an error rather than work on a dictionary instead of a string. This patch should avoid this problem in the future `:-)`

Nathann

### comment:2 Changed 10 years ago by ncohen

• Authors set to Nathann Cohen

### comment:4 Changed 10 years ago by mvngu

• Milestone changed from sage-4.6.2 to sage-4.7

### comment:5 Changed 10 years ago by ncohen

Oh.. Thanks ! :-)

### comment:6 follow-up: ↓ 8 Changed 10 years ago by dsm

Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary. And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.

### comment:7 Changed 10 years ago by jsrn

Ok, I see that I used the class in the wrong way :-)

Wouldn't it be more user-friendly to change the Huffman constructor to accept only one unnamed argument, and then treat it according to type, e.g:

```def __init__(self, source):
if isinstance(source, string):
# init by string
elif isinstance(source, dict):
# init by dict
else:
# give the user an error
```

(Because of deprecation policy, I guess we would need to support the current interface in some way, which would make the code a bit messy, but that will only be for a year)

### comment:8 in reply to: ↑ 6 ; follow-up: ↓ 9 Changed 10 years ago by ncohen

Hello !

Minor: the doctest says " Feeding a dictionary instead of a string:: ", but then feeds it a list of strings, not a dictionary.

Right. I just thought : "Let's feed it anything but a string"

And I don't know if this was intended, but the use of isinstance(string,str) rules out the use of unicode strings.

I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?

Nathann

### comment:9 in reply to: ↑ 8 Changed 10 years ago by dsm

I don't know what an unicode string is, and I will try to find it out immediately. How do you think we should filter it then ?

I think the usual idiom in python 2 is "isinstance(s, basestring)" to allow both.

### comment:10 Changed 10 years ago by ncohen

Hello !!

This is an updated patch in which string has been replaced by basename, and an unique argument is expected by the constructor.

This class being pretty new and not really advertised, perhaps we can do without backward compatibility for once ?.. `:-p`

I mean, this class is perhaps only useful to illustrate what the Huffman algorithm is (slow encoding in Python..), and updating a possibly uncompatible (if it exists somewhere in the world) should not take more than a few seconds `:-)`

Nathann

### comment:11 Changed 10 years ago by jsrn

Nice patch. I agree with sidestepping the deprecation policy :-) I'll try and review once my 4.6.2 finish compiling :-P

Johan

### comment:12 follow-up: ↓ 13 Changed 10 years ago by jsrn

• Status changed from needs_review to positive_review

Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.

### comment:13 in reply to: ↑ 12 Changed 10 years ago by mvngu

• Status changed from positive_review to needs_work

Yep, works fine for me on 4.6.2, and code looks fine as well. Green light.

I object to the positive review. Note that the documentation for the class `Huffman` is out of sync with the definition of that class. For instance, the documentation still refers to the deleted argument `table`. Sorry for being nasty, but this ticket goes to "needs_work".

### comment:14 Changed 10 years ago by ncohen

• Status changed from needs_work to needs_review

Argggggg !!! Sorry about that !!!

I had been looking for occurrences of table = in the code to replace them by source or remove them, and I forgot some occurrences of table in the documentation.... I hope there is none left `:-)`

Nathann

### comment:15 Changed 10 years ago by ncohen

• Description modified (diff)

### comment:16 follow-up: ↓ 17 Changed 10 years ago by jsrn

• Status changed from needs_review to positive_review

Woops, sorry about that. Good thing you're awake Minh :-)

I reread the whole thing with the new patch, and retested, rebuilt and redoc'ed. It seems alright now.

### comment:17 in reply to: ↑ 16 Changed 10 years ago by ncohen

Woops, sorry about that. Good thing you're awake Minh :-)

Minh is always awake. Minh does not rest, and sees everything. Minh is like a better version of Chuck Nurris invented by Chuck Nurris `O_O`

Nathann

### comment:18 Changed 10 years ago by jdemeyer

• Reviewers set to Johan Sebastian Rosenkilde Nielsen

### comment:19 Changed 10 years ago by jdemeyer

• Merged in set to sage-4.7.alpha3
• Resolution set to fixed
• Status changed from positive_review to closed
Note: See TracTickets for help on using tickets.