Here's a C# demo on how to stream compressed audio like Mp3 and Opus from Azure File Storage to raw text format using Azure's Speech-To-Text service and get detailed information like timestamps and duration of each spoken word extracted from the audio file.
Azure Speech to Text only works with wav files and if you want to stream compressed audio like Mp3, Opus etc Microsoft recomments to install and use GStreamer. That's all nice and dandy if you're building a desktop application, but if you want to publish your app online as a webservice, then things get way trickier.
By using Nuget packages, Concentus, Concentus.OggFile and NAudio we can decode the compressed audio as wav and then stream it to Azure Speech to get the audio as text without the need of Gstreamer. Yay!