YouTube2Transcripts (with Speaker Diarization)

A Python tool that downloads YouTube videos as audio and generates detailed transcriptions with speaker diarization using Google's Gemini AI. The tool supports batch processing of multiple URLs.

Example Result

Here's a sample transcript generated by the tool: From this 48-minute video Elon Musk - CEO of Tesla Motors and SpaceX | Entrepreneurship | Khan Academy, you can find this example at ExampleTranscript.md.

Elon Musk: Um yeah, I mean I I I mean the goal is not uh just it's not sort of to become a bra big brand or to compete with uh Honda Civics uh but rather to advance the course of electric vehicles.

Interviewer: Hmm.

Elon Musk: So we're just going to keep making more and more electric cars and driving the price point down until the industry is very firmly electric. You know, like maybe half of all cars made are electric or something like that. Which is not to say that we expect to make all half of all cars. We we we want to just have that catalytic effect until at least that occurs. And I think at the point which there's you know we're approaching half of all new cars made are electric then I think that's I would consider that to be kind of the victory condition.

Interviewer: Wow.

Elon Musk: Um And and so the faster we can bring that day the the better.

Interviewer: Wow. When when would be your guess when that happens?

Elon Musk: Um well, I made a bet with someone about 3 years ago that it would be sooner than 20 years, so it's 17 years from now. But I but that's I think I I that's conservative. I think it's probably you know but maybe maybe 13 or 14 years, something like that.

Interviewer: Wow. Right right about the time Right when we're going to Mars. It'll it'll be it'll be exciting exciting times.

Elon Musk: Yeah. Absolutely. True. That's that's it just could yeah. Exactly. I was just thinking about that. It was like oh those time frames are kind of coincident.

So it does transcript the video, and also identify the speaker (I mean, kind of-).

Features

Downloads YouTube videos as MP3 audio files
Splits long audio files into manageable chunks
Generates detailed transcriptions with speaker identification
Supports batch processing of multiple URLs
Includes rate limiting and retry mechanisms
Progress tracking with tqdm
Concurrent processing with ThreadPoolExecutor

Prerequisites

Google Gemini API key, get yours here

Installation

Clone the repository:

git clone https://github.com/madeyexz/youtube2transcripts.git
cd youtube2transcripts

Install required packages:

pip install -r requirements.txt

Create a .env file in the project root and add your Gemini API key:

GEMINI_API_KEY=your_api_key_here

Usage

Create and activate a virtual environment:

uv venv .venv && source .venv/bin/activate && uv pip install -r requirements.txt

Run the script in one of two ways:

A. Using the GUI interface:
```
python run.py
```
This will open a graphical interface where you can paste URL and read the transcript.

B. Using command-line interface:
```
python youtube_transcriber.py
```
When prompted, enter YouTube URLs one per line. Press Enter twice when done:
```
https://youtube.com/watch?v=example1
https://youtube.com/watch?v=example2
[Press Enter twice to start processing]
```

The script will:

Download audio from each URL
Split audio into 20-minute chunks (because Gemini AI has a output token limit of 8k tokens, which is roughly 30 minutes of people talking)
Process each chunk through Gemini 1.5 flash
Generate and save transcripts in the transcripts_better directory

Output

Transcripts are saved as markdown files in the transcripts_better directory with the following naming convention:

transcript_[sanitized_video_title].md

Configuration

Current LLM model used is gemini-1.5-flash, which is the nice multi-modal model that enabled this project.

Key constants that can be modified in the script are listed below, but are not recommended to be modified, especially if you are not paying for the Gemini API.

CALLS_PER_SECOND: API rate limit (default: 1.8)
MAX_WORKERS: Maximum concurrent jobs (default: 2)
chunk_duration: Audio chunk size in minutes (default: 20)

Error Handling

The script includes:

Automatic retries for failed API calls
Rate limiting to prevent API throttling
Comprehensive logging
File existence checks to prevent duplicate downloads

Contributing

Contributions are welcome! Please feel free to submit a pull request.

Disclaimer

This tool is for educational purposes only. Please ensure you have the right to download and process any YouTube content before using this tool.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
__pycache__		__pycache__
old files		old files
.env_example		.env_example
.gitignore		.gitignore
ExampleTranscript.md		ExampleTranscript.md
README.md		README.md
index.html		index.html
main.py		main.py
requirements.txt		requirements.txt
run.py		run.py
youtube_transcriber.py		youtube_transcriber.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube2Transcripts (with Speaker Diarization)

Example Result

Features

Prerequisites

Installation

Usage

Output

Configuration

Error Handling

Contributing

Disclaimer

About

Releases

Packages

Languages

madeyexz/youtube2transcripts

Folders and files

Latest commit

History

Repository files navigation

YouTube2Transcripts (with Speaker Diarization)

Example Result

Features

Prerequisites

Installation

Usage

Output

Configuration

Error Handling

Contributing

Disclaimer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages