Release 1.3.0 - UI Overhaul and New Backend · Dadangdut33/Speech-Translate

The 1.3.0 release is finally ready. This release fixes lots of bugs and improve the whole app by a lot. With this release, the backend is now using stable-whisper and should generate a more stable and improved results. The whole user interface has also been changed and improved, so now the user experience should be great.

For this release i also provided app installer instead of 7zip extractable .exe.

Before downloading / installing please take a look at the wiki and read the getting started section.

What's Changed

1.3.0 by @Dadangdut33 in #47, thanks to everyone that submit the bug reports and feature requests
Added word level transcription #10 thanks @MaxHaller91 for the request
Added file process indicator
Added color coded for accuracy
Added faster whisper
Added character limit #44 thanks @LearningJer for the request
Added ways to install ffmpeg inside the app
Added customizable output format #42 thanks @joebinglab for the request
Added refinement, alignment, and translation of result
Added ability to export record session with file like output
Added keyboard support for combobox
Added VAD option to record session
Added audiometer for record session indicator
Added ability to use either ndarray or temp file for record session
Added multiple whisper model to the translation engine combobox
Added copy to clipboard button #36 thanks @MirkoPMC for the request
Changed backend to stable whisper #27 thanks @k566o for the report
Changed vanilla logger to loguru
Changed subtitle window to use tkhtml label
Fixed wrong language code #34 thanks @SugarQuiet for the report
Fixed crash that might happen in record session #31 #40 thanks @yslion @FerriteGiant for the report
Fixed clearing on record session
Fixed subtitle windows dragging
Fixed device query #41 thanks @IcarusAegis for the report
Fixed filename mixed up #32 thanks @Corvalan for the report

Full Changelog: 1.2.3...1.3.0

Requirements

Compatible OS:

OS	Prebuilt binary	As a module
Windows	✔️	✔️
MacOS	❌	✔️
Linux	❌	✔️

* Python 3.8 or later (3.11 is recommended) for installation as module.

Speaker input only work on windows 8 and above.
Internet connection (for translation with API)
FFmpeg is required to be installed and added to the PATH environment variable. You can do it when prompted in the app, or you can download it here and add it to your path manually. Alternatively, you can also download and add it to path automatically by using the following commands:

# on Windows using powershell (Also included in the release page, and can be run by right clicking and selecting "Run with PowerShell")
# Must be run in an elevated PowerShell prompt (Run as administrator)
Set-ExecutionPolicy RemoteSigned -Scope CurrentUser # Optional: Needed to run a remote script the first time
& ([scriptblock]::Create(
     (New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/Dadangdut33/Speech-Translate/master/install_ffmpeg.ps1')
  )) -webdl

# on Windows using Winget (Default package manager for Windows 10 and above)
winget install --id=Gyan.FFmpeg  -e

# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg

# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg

# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg

# on Arch Linux
sudo pacman -S ffmpeg

# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg

Recommended to have capable GPU with CUDA compatibility (prebuilt version is using CUDA 11.8) to run each model. Each whisper model has different requirements, for more information you can check it directly at the whisper repository.

Size	Parameters	English-only model	Multilingual model	Required VRAM	Relative speed
tiny	39 M	`tiny.en`	`tiny`	~1 GB	~32x
base	74 M	`base.en`	`base`	~1 GB	~16x
small	244 M	`small.en`	`small`	~2 GB	~6x
medium	769 M	`medium.en`	`medium`	~5 GB	~2x
large	1550 M	N/A	`large`	~10 GB	1x

* This information is also available in the app (hover over the model selection in the app and there will be a tooltip about the model info). Also note that when using faster-whisper, the speed will be significantly faster and the required vram size will be reduced depending on the usage, for more information about this please visit faster-whisper repository

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.3.0 - UI Overhaul and New Backend

What's Changed

Requirements

Contributors