Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
IsakLundstrom authored Sep 29, 2023
1 parent 0934a9e commit 94ba6d5
Showing 1 changed file with 2 additions and 25 deletions.
27 changes: 2 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,26 +1,3 @@
# Project-AI-Translation
# DoRiS - Diarization of Recordings in Speech-to-text

## Friday - 2023-09-22
The first two weeks of working with the project we have divided the project into three parts and from that each sub group studied the topic further and decided which architectures would be used their part. The goal for the first sprint, ending 2023-09-29, is to have a back end up and running with two models that can produce some kind of output, the quality is not that important yet. The project have the following parts:

### Backend
[@IsakLundstrom](https://github.com/IsakLundstrom), [@Racix](https://www.github.com/Racix), [Lundsak](https://github.com/Lundsak)

The current state of the backend:
* REST API - [@Fast API](https://fastapi.tiangolo.com/) (Python)
* Database - MongoDB or Cassandra
* Streaming - Not yet decided.

### Transcription model
[@ClaudeHallard](https://github.com/ClaudeHallard), [@hamidehsani54](https://github.com/hamidehsani54)

The whisper model will be used for transcribing the audio/video files and the following Whisper implementations have been tested:
* [@WhisperJAX](https://github.com/sanchit-gandhi/whisper-jax)
* [@faster_whisper](https://github.com/guillaumekln/faster-whisper)

### Diarization model
[@oskar-dah](https://github.com/oskar-dah), [@langemittbacken](https://github.com/langemittbacken), [@tonytonfisk2](https://github.com/tonytonfisk2)

For determening who is speaking two diarization models have been tested, pyannote was easy to set up but did not perform vey well so we decided to also test the NeMo model which we have not yet been able to test since it has been more difficult to get it to run.
* [@NeMo](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/speaker_diarization/intro.html)
* [@pyannote](https://github.com/pyannote/pyannote-audio)
Today, there is a pressing need for speech transcription and translation to increase the accessibility of information for everyone in society. This need comes in various forms such as live meetings, recorded videos, or phone calls. Therefore, we are developing a service that, with the help of AI, can automate these processes to efficiently use time and resources.

0 comments on commit 94ba6d5

Please sign in to comment.