Demo: Speech processing MFCC demo #603

srush · 2021-07-17T00:11:57Z

This is a demo of speech preprocessing. Mostly an attempt to try out some applied signal processing and see what you might be able to do. It includes

Reading and using Wave files
Some application of FFT and DCT
Dex implementation of the Mel Scale and the overlapping windows scaling method.

Things that are great:

Really liking signal processing in dex. Feels very natural.
Parser library is fun and easy to use.
new bit interface is cool! (I think I messed it up though, not sure how to go from word8's to int16)
Windowing / convolutions / padding turned out nice.

Things that I am still struggling with:

Slicing is kind of annoying, and I end up with too many types.
Special casing on loops always feels a bit verbose. particularly when matching indices.
Too much ordinal arithmetic. I think that needs to be in the prelude. It is really hard to reverse an ordinal, get neighbors, check order etc.
Plots could be prettier. Like connecting dots?

Things that are a problem:

I have like three real jobs, and I spend too much time writing dex.

apaszke

This is pretty cool! I learn something every time I go through your PRs.

While this already looks good, I am hoping that we might be able to polish the indexing portions a little more, and also possibly figure out what we should change in the language to make it better.

examples/alignment.dx

apaszke · 2021-07-18T09:31:50Z

examples/alignment.dx

+z = FToI (pow 2.0 15.0)
+def W32ToI (x : Word32): Int  =
+    y:Int = internalCast _ x
+    select (y <= z) y ((-1)*z + (y- z))


Is that meant to reinterpret the Word32 as a twos complement encoded integer?

Yes! This is super hacky, but I couldn't come up with a better way. Wav file stores int16's in this format.

apaszke · 2021-07-18T09:47:58Z

examples/alignment.dx

+    iter \i .
+         case i < n of
+              True ->
+                  _ = parse h (pChar ls.(i@_))


You might want to use unsafeFromOrdinal here. I guess we should add a !@! operator for that at some point

What is the benefit? It is faster?

Yup, faster both to compile and to execute. With a downside that if you do end up casting an integer out of range then you can get memory corruption and segfaults.

examples/alignment.dx

lib/fft.dx

apaszke · 2021-07-18T21:21:42Z

examples/alignment.dx

+MelBins = Fin 28
+
+hscale : ScaledRange Positive = AsScaledRange {start=0.0, end=samplerate / 2.0}
+melscale : ScaledRange MelBins = AsScaledRange {start=mel 0.0, end=mel (samplerate / 2.0)}


Might be worth mentioning why do you pick samplerate / 2.0 as the end

examples/alignment.dx

google-cla · 2021-07-21T04:27:06Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

srush · 2021-07-21T04:36:43Z

Still not fully cleaned up, but made a second attempt at typed windowing and padding (added to the prelude for now). Think this better gets at the core idea of running a sliding window filter over a statically sized table. Might just be being stubborn here, but trying to avoid using List when not necessary.

duvenaud · 2021-08-05T16:14:29Z

Re: how to go from word8's to int16, I figured this out the other day:

def pixel (x:Char) : Int =
  r = W8ToI x
  case r < 0 of
    True -> 256 + r
    False -> r

I'm pretty sure the code in the nn.dx example is wrong:
https://github.com/google-research/dex-lang/blob/main/examples/nn.dx#L193

def pixel (x:Char) : Float32 =
     r = W8ToI x
     IToF case r < 0 of
             True -> (abs r) + 128
             False -> r

srush · 2021-08-05T18:12:11Z

Thanks, this is helpful. Wonder if these are both horribly inefficient though.

Also in the Wav format it is two byte Ints. Any ideas?

duvenaud · 2021-08-05T21:05:07Z

Hmmm, for two byte ints, I guess I would just add an Int16 type to the Dex compiler and prelude, by copying all the Int32 stuff.

apaszke · 2021-08-24T11:21:32Z

lib/prelude.dx

+           m => (Window left right) => n =
+    for i : m.
+        for j : (Window left right).
+            k = fromOrdinal _ $ (ordinal i) + (ordinal j)


This should really use unsafeFromOrdinal

apaszke · 2021-08-24T11:21:59Z

lib/prelude.dx

+' Needs to  be called with `castTable` to do the striding.
+
+def stride (tab : (m & Fin len)=>n) : m => n =
+    for i. tab.(i, 0@_)


it would be better to do 0@_ outside of the loop, because at the moment we might fail to hoist it and it's not free

apaszke · 2021-08-24T11:23:06Z

lib/prelude.dx

+         pad (for i. init) (for j. pad init tab.j)
+
+def stride2 (tab : (m & Fin vlen)=>(n & Fin hlen) => o) : m => n => o=
+    for i j. tab.(i, 0@_).(j, 0@_)


For now can we keep those functions in the example? I'm thinking of adding some support for windowing via a generalization of the tiling infrastructure we already have (windowing is just tiling with overlapping tiles!)

apaszke · 2021-08-24T11:24:32Z

examples/alignment.dx

+PostWindow = Fin $ idiv (size Datsize) (size Step)
+
+frame_split : PostWindow => FrameWindow => Float =
+             stride $ castTable (_ & Step) $ window $ pad 0.0 signal


wow this is pretty neat

srush added 3 commits July 16, 2021 19:51

MFCC speech processing demo

fb1bd0b

fft

b77105f

scael

26c9d8b

google-cla bot added the cla: yes label Jul 17, 2021

apaszke approved these changes Jul 19, 2021

View reviewed changes

google-cla bot added cla: no and removed cla: yes labels Jul 21, 2021

Move around some of the functions

31317d6

srush force-pushed the bio branch from 08bb35b to 31317d6 Compare July 21, 2021 04:30

google-cla bot added cla: yes and removed cla: no labels Jul 21, 2021

update

58dcfb3

apaszke reviewed Aug 24, 2021

View reviewed changes

apaszke force-pushed the main branch from 46b8727 to 8db43fc Compare May 13, 2022 14:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demo: Speech processing MFCC demo #603

Demo: Speech processing MFCC demo #603

srush commented Jul 17, 2021 •

edited

Loading

apaszke left a comment

apaszke Jul 18, 2021

srush Jul 19, 2021

apaszke Jul 18, 2021

srush Jul 19, 2021

apaszke Jul 19, 2021

apaszke Jul 18, 2021

google-cla bot commented Jul 21, 2021

srush commented Jul 21, 2021

duvenaud commented Aug 5, 2021

srush commented Aug 5, 2021

duvenaud commented Aug 5, 2021 •

edited

Loading

apaszke Aug 24, 2021

apaszke Aug 24, 2021

apaszke Aug 24, 2021

apaszke Aug 24, 2021

Demo: Speech processing MFCC demo #603

Are you sure you want to change the base?

Demo: Speech processing MFCC demo #603

Conversation

srush commented Jul 17, 2021 • edited Loading

apaszke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

google-cla bot commented Jul 21, 2021

srush commented Jul 21, 2021

duvenaud commented Aug 5, 2021

srush commented Aug 5, 2021

duvenaud commented Aug 5, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

srush commented Jul 17, 2021 •

edited

Loading

duvenaud commented Aug 5, 2021 •

edited

Loading