Opened 3 years ago
Closed 3 years ago
#26026 closed defect (fixed)
Updates to Huffman codings
Reported by: | embray | Owned by: | |
---|---|---|---|
Priority: | major | Milestone: | sage-8.5 |
Component: | coding theory | Keywords: | |
Cc: | tscrim | Merged in: | |
Authors: | Erik Bray | Reviewers: | Frédéric Chapoton, Travis Scrimshaw |
Report Upstream: | N/A | Work issues: | |
Branch: | e913f5d (Commits, GitHub, GitLab) | Commit: | e913f5df63139d7e1000adf3dcf7aa51deddeb1f |
Dependencies: | Stopgaps: |
Description
The main goal of these changes were to make this class work on Python 3, which previously it didn't due to comparisons failing while manipulating the heap queue data structures when building the tree. The other problem was that the use of dict iterators meant that the codings weren't quite predictable; sometimes the codes for two characters could be swapped, leading to different test results (granted, not in a way that affects the length of the encodings).
This reimplements building the tree in a way that's deterministic and works the same on Python 2 and 3, and fixes a few other minor issues I noticed.
Change History (13)
comment:1 Changed 3 years ago by
- Commit set to d41dbe25f3480c7853556c012bde80c15741334a
comment:2 Changed 3 years ago by
- Status changed from new to needs_review
comment:3 Changed 3 years ago by
The ticket #18315 also makes extensive changes to the Huffman coding algorithm.
comment:4 follow-up: ↓ 6 Changed 3 years ago by
Oops, I didn't see that one; it is quite old. Are you trying to revive it?
comment:5 Changed 3 years ago by
Looking at the implementation in #18315, it would suffer from the same problems this is trying to fix, so I think this ticket is still relevant, albeit obviously not directly compatible with #18315.
I haven't looked closely at the other coding algorithms added in that ticket but they might have some of the same kinds of problems too. I have a few other nitpicks with that ticket but it looks good on the whole.
comment:6 in reply to: ↑ 4 Changed 3 years ago by
Replying to embray:
Oops, I didn't see that one; it is quite old. Are you trying to revive it?
I haven't looked at it in any real detail yet, and I'm also not knowledgeable on the subject. It did seem worthy of a little more attention though.
comment:7 Changed 3 years ago by
- Milestone changed from sage-8.4 to sage-8.5
comment:8 Changed 3 years ago by
- Status changed from needs_review to needs_work
one trivial failing doctest
comment:9 Changed 3 years ago by
- Branch changed from u/embray/python3/sage-coding-source_coding-huffman to public/ticket/26026
- Commit changed from d41dbe25f3480c7853556c012bde80c15741334a to e913f5df63139d7e1000adf3dcf7aa51deddeb1f
- Status changed from needs_work to needs_review
comment:11 Changed 3 years ago by
please review (this is about a file with many failing doctest in python 3)
comment:12 Changed 3 years ago by
- Reviewers set to Frédéric Chapoton, Travis Scrimshaw
- Status changed from needs_review to positive_review
LGTM.
comment:13 Changed 3 years ago by
- Branch changed from public/ticket/26026 to e913f5df63139d7e1000adf3dcf7aa51deddeb1f
- Resolution set to fixed
- Status changed from positive_review to closed
Branch pushed to git repo; I updated commit sha1. New commits:
improve determinism in Huffman codings
different algorithm without using heap queues
remove stray line in test
explicitly raise an error when trying to make a Huffman code for fewer than 2 symbols (for which the code in this class breaks)