Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Improve documentation regarding audio ingestion pipeline to inform the user when to use VAD #431

Open
TomLucidor opened this issue Nov 17, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@TomLucidor
Copy link

(Just a bit of a personal note that could be included in the ReadMe)

  1. Download Docker Desktop to see it run in the background
  2. Download the following Docker file (assuming it has GPU) https://github.com/rmusser01/tldw/blob/main/Helper_Scripts/Dockerfiles/tldw-nvidia_amd64_Dockerfile
  3. Save this file in its own folder, and change the file name to to Dockerfile and remove the .txt extension
  4. Follow the commented instruction on top with the first 2 commands
  5. Bob's ya uncle m8
@TomLucidor
Copy link
Author

Currently checking if

  • There is not enough instructions on when to use "Speaker Diarization" (e.g. monologue vs dialogue) or "VAD" (e.g. packed video essay)
  • Whisper distil-large seems to bug out once and a while, would be good to have recommended model guides
  • Connecting to Ollama using Docker Tl;DW is hard since setting the model being used (e.g. Phi-3) is hard

@rmusser01
Copy link
Owner

Thanks for the suggestions/feedback, responding in order,

  1. The tldw-nvidia_amd64_Dockerfile is the same as the Dockerfile in the root of the repo(Should be, cursory glance says its mostly the same) minus comments). The .txt being added is a result of your local system, and not how its being stored. It's also my assumption that anyone who stumbles upon this project and is looking to get it working with Docker/is aware of Docker, already has it installed.
  • That being said, the current documentation is very lacking and is on my to-do list, after I get more features put in, as those are higher priority for me right now.
  1. Speaker Diarization/VAD: This pipeline needs to be tweaked/re-examined as the diarization is pretty bad. That and I believe it is unoptimized and as such, could do with some improvement. This was on my to-do list last week, but things got in the way and is still 'in waiting'.
  2. Same with 2, pipeline needs some tweaking. Do agree on having a default model/recommendations. I personally use Large-v2 as I have found it to have the least issues with the current pipeline. large-v3 just misses the majority of speech, VAD or not.
  3. I'm not sure what you mean here, as this is reliant on your personal setup. Are you referring to connecting the docker container to the outside(of docker) network so that ollama can successfully communicate with it?

@rmusser01 rmusser01 self-assigned this Nov 17, 2024
@rmusser01 rmusser01 added this to the User-Filed-Issues milestone Nov 17, 2024
@TomLucidor
Copy link
Author

@rmusser01 thanks for the reply, the first numbered list is reminding myself how to install Docker with Docker, but I would love ya to address the bullet points instead (along hopefully with more documentations and guides). Especially now I would focus on how to get the Docker to recognize that I need Phi-3 SLM (and maybe an embedding model) from Ollama to do summarization. The pipeline tho definitely need more docs along with re-works since people might not have been explicit about their use case or video genre type... should I explain more?

@rmusser01
Copy link
Owner

rmusser01 commented Nov 19, 2024

edit: Ah, I understand what you meant.
I will continue to prioritize as I have, and will address the documentation as I get to it.
Explaining how to use docker or ollama is not in the scope of this project. It is expected that a current user should know or understand how to use docker with custom dockerfiles or docker volumes for external storage if they are pursuing usage with docker.
It is also assumed that if a user is attempting to use ollama, they should have no issues understanding how the API works so that they can ensure that network connectivity is successful between the two.
I will edit this issue to reflect your suggestion of improving the documentation regarding the audio ingestion pipeline.

@rmusser01 rmusser01 changed the title Documentation note: How to use Docker Enhancement: Improve documentation regarding audio ingestion pipeline to inform the user when to use VAD Nov 19, 2024
@rmusser01 rmusser01 added the enhancement New feature or request label Nov 19, 2024
@TomLucidor
Copy link
Author

TomLucidor commented Nov 20, 2024

@rmusser01 for using Docker and Ollama on its own, it is fairly easy, however I am facing some weird problem regarding setting which Ollama models to use in TL;DW since there is no such setting in the panel, and that they recommended fixing the config.txt file (which is packed within the file system of Docker, and is sightly less streamlined to use).
I will keep testing features from this panel and others to see how an average user with moderate experience would face, and draft notes accordingly.

@rmusser01
Copy link
Owner

rmusser01 commented Nov 22, 2024

FYI you can edit the config file from the web UI
@TomLucidor

@rmusser01 rmusser01 modified the milestones: v1.0 Release, Beta V9, Beta v10 Dec 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants