Arithmetic coding demo #631

cyntsh · 2021-08-25T16:09:44Z

In this demo, recursive interval subdivision for arithmetic coding is reformulated using fold. Side note: I also found the types very nice to reason with, esp. Interval, which reduces the algorithm to manipulating intervals.

duvenaud · 2021-08-25T17:30:29Z

examples/arithmetic-coding.dx

+First, model the probability of each letter given by the string to be encoded.
+
+def cumProb (ps: n=>Float) : n=>Float =
+  withState 0.0 \total.


I think this can be computed with Accum instead of state.

duvenaud · 2021-08-25T17:30:54Z

examples/arithmetic-coding.dx

+
+def getFrequency (str: (Fin l)=>Word8) : Alphabet=>Int =
+  a: Alphabet => Int = zero
+  yieldState a \ref. for i. 


This one can also be computed with a parallel Accum.

duvenaud · 2021-08-25T17:31:28Z

examples/arithmetic-coding.dx

+
+'### Demo: Lossless compression on a test string
+
+str' = "abbadcabccdd"


Can you make a longer test? I'm not convinced that there aren't lurking floating-point issues in this implementation.

A longer test here would fail. I could change all instances of Float to Float64 for more precision and pass the longer test, but the code would look bloated. i.e. this line,
top:Interval = (0., 1.)
would become
top:Interval = (FToF64 0.,FToF64 1.)
Is there a way to define a Float to arbitrary precision? I considered using integer arithmetic instead of floating-point arithmetic for better control over precision, but that code is still a WIP.

Hmmm. Would an even longer test then fail even if you used F64? I get that this is just a demo, but if it only works for short sequences we should put some warnings in.

I agree. I'll probably make a more serious attempt at integer arithmetic before I commit to this implementation and add warnings.

duvenaud · 2021-08-25T19:08:45Z

examples/arithmetic-coding.dx

+
+def getProbability (l: Int) (freq: Alphabet=>Int) : Alphabet=>(Float&Float) =
+  probs = for i. IToF freq.i / IToF l
+  cums = cumProb probs


Is this going to repeat work every time it's called?

The probabilities are cached on line 92:
p = getProbability l $ getFrequency str
So it's only calculated once.

duvenaud · 2021-08-25T19:10:26Z

examples/arithmetic-coding.dx

+      True -> Continue
+      False ->
+        (x, w) = rule.(j@_) in 
+        case code >= x && code < (x+w) of


Is this a binary search?

It's actually just linear search on intervals, following after most code implementations I've seen. But maybe it'd scale better to larger alphabets if binary search were implemented instead.

You might be able to use searchSorted from the prelude:
https://github.com/google-research/dex-lang/blob/main/lib/prelude.dx#L1620

duvenaud · 2021-08-25T19:11:09Z

examples/arithmetic-coding.dx

+the encoded letter.
+The decoding process retraces the steps of the encoding process to recover the correct letters.
+
+def encode (str: (Fin l)=>Word8) (rule: Alphabet=>(Interval->Interval)) : Float =


Why Fin l and not just in?

It gets us this error:

Type error: Expected: ((Fin a) => Word8) Actual: (in => Word8) (Solving for: [a:Int32]) update = subdivide str rule

But yeah, it does look more succinct with in.

Oh, I see. Well it's not a big deal either way.

j-towns · 2021-11-10T11:34:57Z

@cyntsh are you familiar with asymmetric numeral systems (ANS)? This is a more recent and nowadays more widely used alternative to arithmetic coding, and on a high level the only difference is that AC is 'queue-like', or first-in-first-out, whereas ANS is 'stack-like', or last-in-first-out.

The power of ANS is roughly equivalent to that of AC, but it is significantly easier to implement (I'm talking here about an 'exact' implementation in terms of integer arithmetic). In case you're interested, I've written a short (< 50 lines), pure functional, pure Python (no imports) ANS implementation which I imagine would not be too difficult to port into Dex: https://github.com/j-towns/ans-notes/blob/master/rans.py.

cyntsh · 2021-11-10T22:28:46Z

Thanks for the suggestion, @j-towns ! Your ANS implementation looks incredibly compact - I'll have a read and give it a try sometime this week.

j-towns · 2022-01-06T08:04:55Z

@cyntsh how come you can't use the Word types and bit-wise operations defined in the Prelude? (these ones)

cyntsh · 2022-01-07T05:04:43Z

@j-towns good point, that does improve the compression accuracy, but still not quite at the level of your python implementation. I suspect that it has to do with comparators (<, >) when the Word64-type integers get large enough. After writing a Word64 instance for Ord, here’s what I get:

w: Word64 = (one .<<. 63)
:p w
> 0x8000000000000000
:p w > zero
> False

cyntsh added 15 commits August 6, 2021 17:18

Create arithmetic-coding.dx

e6e3585

Clean up

edcf280

switch from floating point to integer arithmetic

84b5fa8

Some refactoring

72b1202

Another WIP

7f65307

Encoding algo completed

8003af5

First working version. ish.

3c934e7

Utter refactor

d616f7f

Deal with enqueue overflow

77341c9

List of codes for dequeue + minor bugs

31d0b37

WIP

2b48f4c

Relax interval constraint

4d876f1

Revert and clean up

d2dd11f

Add comments

4c43d9b

Edit comments

7ebc207

google-cla bot added the cla: yes label Aug 25, 2021

duvenaud reviewed Aug 25, 2021

View reviewed changes

rANS

6a9595b

Use Word64 type

6b48407

apaszke force-pushed the main branch from 46b8727 to 8db43fc Compare May 13, 2022 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Arithmetic coding demo #631

Arithmetic coding demo #631

cyntsh commented Aug 25, 2021

duvenaud Aug 25, 2021

duvenaud Aug 25, 2021

duvenaud Aug 25, 2021

cyntsh Aug 25, 2021

duvenaud Aug 25, 2021

cyntsh Aug 26, 2021

duvenaud Aug 25, 2021

cyntsh Aug 25, 2021

duvenaud Aug 25, 2021

cyntsh Aug 25, 2021

duvenaud Aug 25, 2021

duvenaud Aug 25, 2021

cyntsh Aug 25, 2021

duvenaud Aug 25, 2021

j-towns commented Nov 10, 2021

cyntsh commented Nov 10, 2021

j-towns commented Jan 6, 2022

cyntsh commented Jan 7, 2022


		'### Demo: Lossless compression on a test string

		str' = "abbadcabccdd"

Arithmetic coding demo #631

Are you sure you want to change the base?

Arithmetic coding demo #631

Conversation

cyntsh commented Aug 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-towns commented Nov 10, 2021

cyntsh commented Nov 10, 2021

j-towns commented Jan 6, 2022

cyntsh commented Jan 7, 2022