-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
0934a9e
commit 94ba6d5
Showing
1 changed file
with
2 additions
and
25 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,26 +1,3 @@ | ||
# Project-AI-Translation | ||
# DoRiS - Diarization of Recordings in Speech-to-text | ||
|
||
## Friday - 2023-09-22 | ||
The first two weeks of working with the project we have divided the project into three parts and from that each sub group studied the topic further and decided which architectures would be used their part. The goal for the first sprint, ending 2023-09-29, is to have a back end up and running with two models that can produce some kind of output, the quality is not that important yet. The project have the following parts: | ||
|
||
### Backend | ||
[@IsakLundstrom](https://github.com/IsakLundstrom), [@Racix](https://www.github.com/Racix), [Lundsak](https://github.com/Lundsak) | ||
|
||
The current state of the backend: | ||
* REST API - [@Fast API](https://fastapi.tiangolo.com/) (Python) | ||
* Database - MongoDB or Cassandra | ||
* Streaming - Not yet decided. | ||
|
||
### Transcription model | ||
[@ClaudeHallard](https://github.com/ClaudeHallard), [@hamidehsani54](https://github.com/hamidehsani54) | ||
|
||
The whisper model will be used for transcribing the audio/video files and the following Whisper implementations have been tested: | ||
* [@WhisperJAX](https://github.com/sanchit-gandhi/whisper-jax) | ||
* [@faster_whisper](https://github.com/guillaumekln/faster-whisper) | ||
|
||
### Diarization model | ||
[@oskar-dah](https://github.com/oskar-dah), [@langemittbacken](https://github.com/langemittbacken), [@tonytonfisk2](https://github.com/tonytonfisk2) | ||
|
||
For determening who is speaking two diarization models have been tested, pyannote was easy to set up but did not perform vey well so we decided to also test the NeMo model which we have not yet been able to test since it has been more difficult to get it to run. | ||
* [@NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_diarization/intro.html) | ||
* [@pyannote](https://github.com/pyannote/pyannote-audio) | ||
Today, there is a pressing need for speech transcription and translation to increase the accessibility of information for everyone in society. This need comes in various forms such as live meetings, recorded videos, or phone calls. Therefore, we are developing a service that, with the help of AI, can automate these processes to efficiently use time and resources. |