Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demo: Speech processing MFCC demo #603

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open

Conversation

srush
Copy link
Contributor

@srush srush commented Jul 17, 2021

This is a demo of speech preprocessing. Mostly an attempt to try out some applied signal processing and see what you might be able to do. It includes

image
image

  • Reading and using Wave files
  • Some application of FFT and DCT
  • Dex implementation of the Mel Scale and the overlapping windows scaling method.

Things that are great:

  • Really liking signal processing in dex. Feels very natural.
  • Parser library is fun and easy to use.
  • new bit interface is cool! (I think I messed it up though, not sure how to go from word8's to int16)
  • Windowing / convolutions / padding turned out nice.

Things that I am still struggling with:

  • Slicing is kind of annoying, and I end up with too many types.
  • Special casing on loops always feels a bit verbose. particularly when matching indices.
  • Too much ordinal arithmetic. I think that needs to be in the prelude. It is really hard to reverse an ordinal, get neighbors, check order etc.
  • Plots could be prettier. Like connecting dots?

Things that are a problem:

  • I have like three real jobs, and I spend too much time writing dex.

@google-cla google-cla bot added the cla: yes label Jul 17, 2021
Copy link
Collaborator

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty cool! I learn something every time I go through your PRs.

While this already looks good, I am hoping that we might be able to polish the indexing portions a little more, and also possibly figure out what we should change in the language to make it better.

examples/alignment.dx Outdated Show resolved Hide resolved
z = FToI (pow 2.0 15.0)
def W32ToI (x : Word32): Int =
y:Int = internalCast _ x
select (y <= z) y ((-1)*z + (y- z))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that meant to reinterpret the Word32 as a twos complement encoded integer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! This is super hacky, but I couldn't come up with a better way. Wav file stores int16's in this format.

iter \i .
case i < n of
True ->
_ = parse h (pChar ls.(i@_))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use unsafeFromOrdinal here. I guess we should add a !@! operator for that at some point

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit? It is faster?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, faster both to compile and to execute. With a downside that if you do end up casting an integer out of range then you can get memory corruption and segfaults.

examples/alignment.dx Outdated Show resolved Hide resolved
examples/alignment.dx Outdated Show resolved Hide resolved
examples/alignment.dx Show resolved Hide resolved
lib/fft.dx Outdated Show resolved Hide resolved
MelBins = Fin 28

hscale : ScaledRange Positive = AsScaledRange {start=0.0, end=samplerate / 2.0}
melscale : ScaledRange MelBins = AsScaledRange {start=mel 0.0, end=mel (samplerate / 2.0)}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth mentioning why do you pick samplerate / 2.0 as the end

examples/alignment.dx Outdated Show resolved Hide resolved
examples/alignment.dx Outdated Show resolved Hide resolved
@google-cla
Copy link

google-cla bot commented Jul 21, 2021

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

@google-cla google-cla bot added cla: no and removed cla: yes labels Jul 21, 2021
@srush
Copy link
Contributor Author

srush commented Jul 21, 2021

Still not fully cleaned up, but made a second attempt at typed windowing and padding (added to the prelude for now). Think this better gets at the core idea of running a sliding window filter over a statically sized table. Might just be being stubborn here, but trying to avoid using List when not necessary.

@duvenaud
Copy link
Contributor

duvenaud commented Aug 5, 2021

Re: how to go from word8's to int16, I figured this out the other day:

def pixel (x:Char) : Int =
  r = W8ToI x
  case r < 0 of
    True -> 256 + r
    False -> r

I'm pretty sure the code in the nn.dx example is wrong:
https://github.com/google-research/dex-lang/blob/main/examples/nn.dx#L193

def pixel (x:Char) : Float32 =
     r = W8ToI x
     IToF case r < 0 of
             True -> (abs r) + 128
             False -> r

@srush
Copy link
Contributor Author

srush commented Aug 5, 2021

Thanks, this is helpful. Wonder if these are both horribly inefficient though.

Also in the Wav format it is two byte Ints. Any ideas?

@duvenaud
Copy link
Contributor

duvenaud commented Aug 5, 2021

Hmmm, for two byte ints, I guess I would just add an Int16 type to the Dex compiler and prelude, by copying all the Int32 stuff.

m => (Window left right) => n =
for i : m.
for j : (Window left right).
k = fromOrdinal _ $ (ordinal i) + (ordinal j)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really use unsafeFromOrdinal

' Needs to be called with `castTable` to do the striding.

def stride (tab : (m & Fin len)=>n) : m => n =
for i. tab.(i, 0@_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to do 0@_ outside of the loop, because at the moment we might fail to hoist it and it's not free

pad (for i. init) (for j. pad init tab.j)

def stride2 (tab : (m & Fin vlen)=>(n & Fin hlen) => o) : m => n => o=
for i j. tab.(i, 0@_).(j, 0@_)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now can we keep those functions in the example? I'm thinking of adding some support for windowing via a generalization of the tiling infrastructure we already have (windowing is just tiling with overlapping tiles!)

PostWindow = Fin $ idiv (size Datsize) (size Step)

frame_split : PostWindow => FrameWindow => Float =
stride $ castTable (_ & Step) $ window $ pad 0.0 signal
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow this is pretty neat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants