-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arithmetic coding demo #631
base: main
Are you sure you want to change the base?
Conversation
examples/arithmetic-coding.dx
Outdated
First, model the probability of each letter given by the string to be encoded. | ||
|
||
def cumProb (ps: n=>Float) : n=>Float = | ||
withState 0.0 \total. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be computed with Accum
instead of state.
|
||
def getFrequency (str: (Fin l)=>Word8) : Alphabet=>Int = | ||
a: Alphabet => Int = zero | ||
yieldState a \ref. for i. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one can also be computed with a parallel Accum
.
examples/arithmetic-coding.dx
Outdated
|
||
'### Demo: Lossless compression on a test string | ||
|
||
str' = "abbadcabccdd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make a longer test? I'm not convinced that there aren't lurking floating-point issues in this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A longer test here would fail. I could change all instances of Float to Float64 for more precision and pass the longer test, but the code would look bloated. i.e. this line,
top:Interval = (0., 1.)
would become
top:Interval = (FToF64 0.,FToF64 1.)
Is there a way to define a Float to arbitrary precision? I considered using integer arithmetic instead of floating-point arithmetic for better control over precision, but that code is still a WIP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. Would an even longer test then fail even if you used F64? I get that this is just a demo, but if it only works for short sequences we should put some warnings in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I'll probably make a more serious attempt at integer arithmetic before I commit to this implementation and add warnings.
examples/arithmetic-coding.dx
Outdated
|
||
def getProbability (l: Int) (freq: Alphabet=>Int) : Alphabet=>(Float&Float) = | ||
probs = for i. IToF freq.i / IToF l | ||
cums = cumProb probs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to repeat work every time it's called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The probabilities are cached on line 92:
p = getProbability l $ getFrequency str
So it's only calculated once.
examples/arithmetic-coding.dx
Outdated
True -> Continue | ||
False -> | ||
(x, w) = rule.(j@_) in | ||
case code >= x && code < (x+w) of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a binary search?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's actually just linear search on intervals, following after most code implementations I've seen. But maybe it'd scale better to larger alphabets if binary search were implemented instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might be able to use searchSorted
from the prelude:
https://github.com/google-research/dex-lang/blob/main/lib/prelude.dx#L1620
examples/arithmetic-coding.dx
Outdated
the encoded letter. | ||
The decoding process retraces the steps of the encoding process to recover the correct letters. | ||
|
||
def encode (str: (Fin l)=>Word8) (rule: Alphabet=>(Interval->Interval)) : Float = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Fin l
and not just in
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets us this error:
Type error:
Expected: ((Fin a) => Word8)
Actual: (in => Word8)
(Solving for: [a:Int32])
update = subdivide str rule
But yeah, it does look more succinct with in
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Well it's not a big deal either way.
@cyntsh are you familiar with asymmetric numeral systems (ANS)? This is a more recent and nowadays more widely used alternative to arithmetic coding, and on a high level the only difference is that AC is 'queue-like', or first-in-first-out, whereas ANS is 'stack-like', or last-in-first-out. The power of ANS is roughly equivalent to that of AC, but it is significantly easier to implement (I'm talking here about an 'exact' implementation in terms of integer arithmetic). In case you're interested, I've written a short (< 50 lines), pure functional, pure Python (no imports) ANS implementation which I imagine would not be too difficult to port into Dex: https://github.com/j-towns/ans-notes/blob/master/rans.py. |
Thanks for the suggestion, @j-towns ! Your ANS implementation looks incredibly compact - I'll have a read and give it a try sometime this week. |
@cyntsh how come you can't use the |
@j-towns good point, that does improve the compression accuracy, but still not quite at the level of your python implementation. I suspect that it has to do with comparators (<, >) when the Word64-type integers get large enough. After writing a Word64 instance for Ord, here’s what I get:
|
In this demo, recursive interval subdivision for arithmetic coding is reformulated using fold. Side note: I also found the types very nice to reason with, esp. Interval, which reduces the algorithm to manipulating intervals.