Skip to content

Commit

Permalink
changelog and docs
Browse files Browse the repository at this point in the history
  • Loading branch information
NotJoeMartinez committed Jul 6, 2024
1 parent 6c59219 commit a5267a8
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 13 deletions.
29 changes: 25 additions & 4 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,27 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](http://keepachangelog.com/).

## [0.1.52] - 2024-07-06
### Added
- `llm` command for Retrieval-Augmented Generation on channels with embeddings
- https://github.com/NotJoeMartinez/yt-fts/pull/156
- Way to specify time interval when generating embeddings
- https://github.com/NotJoeMartinez/yt-fts/pull/155
- pytest unit testing for basic cli functionality
- https://github.com/NotJoeMartinez/yt-fts/pull/151
### Changed
- Changed `get-embeddings` command to `embeddings` (it's cleaner)
- https://github.com/NotJoeMartinez/yt-fts/pull/155
- Refomatted most files to follow PEP 8 style guides
- https://github.com/NotJoeMartinez/yt-fts/pull/153
- Most of the commands now exit with status code
- https://github.com/NotJoeMartinez/yt-fts/pull/152
- Refactored to not use `import *`
- https://github.com/NotJoeMartinez/yt-fts/pull/154
## Fixed
- Removed Regex warning when first running cli
- Delete not working if you use a capital Y

## [0.1.51] - 2024-07-04
### Fixed
- Fixed broken `get_channel_id` function cause by YouTube change to video page html
Expand Down Expand Up @@ -53,7 +74,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/).

### Added
- [yt-fts-132](https://github.com/NotJoeMartinez/yt-fts/pull/132)
- Github actions integration
- GitHub actions integration



Expand Down Expand Up @@ -86,7 +107,7 @@ Special thanks to [@danlamanna](https://github.com/danlamanna) for these fixes
## [0.1.39] - 2023-12-31
### Fixed
- [yt-fts-118](https://github.com/NotJoeMartinez/yt-fts/pull/118)
- Major: Fixed bug where download will fail if channel does not have live stream page
- Major: Fixed bug where download will fail if channel does not have live-stream page

## [0.1.38] - 2023-12-29
### Added
Expand All @@ -106,7 +127,7 @@ Special thanks to [@danlamanna](https://github.com/danlamanna) for these fixes
## [0.1.36] - 2023-12-25
### Fixed
- [yt-fts-112](https://github.com/NotJoeMartinez/yt-fts/pull/112)
- Medium: Fixed issue with download command not downloading live streamed videos
- Medium: Fixed issue with download command not downloading live-streamed videos

### Added
- [yt-fts-111](https://github.com/NotJoeMartinez/yt-fts/pull/111)
Expand Down Expand Up @@ -176,5 +197,5 @@ Special thanks to [@danlamanna](https://github.com/danlamanna) for these fixes

- [yt-fts-67](https://github.com/NotJoeMartinez/yt-fts/issues/67)

Minor: YouTube URL validation now allows for /@channelName and /channle/channelID
Minor: YouTube URL validation now allows for /@channelName and /channel/channelID
instead of forcing /@channel/videos.
29 changes: 22 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@

# yt-fts - Youtube Full Text Search
`yt-fts` is a command line program that uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to scrape all of a youtube channels subtitles and load them into an sqlite database that is searchable from the command line. It allows you to query a channel for specific key word or phrase and will generate time stamped youtube urls to
# yt-fts - YouTube Full Text Search
`yt-fts` is a command line program that uses [yt-dlp](https://github.com/yt-dlp/yt-dlp) to scrape all of a YouTube
channels subtitles and load them into a sqlite database that is searchable from the command line. It allows you to
query a channel for specific key word or phrase and will generate time stamped YouTube urls to
the video containing the keyword.

It also supports semantic search via the [OpenAI embeddings API](https://beta.openai.com/docs/api-reference/) using [chromadb](https://github.com/chroma-core/chroma).

- [Blog Post](https://notjoemartinez.com/blog/youtube_full_text_search/)
- [Semantic Search](#Semantic-Search-via-OpenAI-embeddings-API)
- [LLM/RAG Chat Bot](#llm-chat-bot)
- [Semantic Search](#vsearch-semantic-search)
- [CHANGELOG](CHANGELOG.md)

https://github.com/NotJoeMartinez/yt-fts/assets/39905973/6ffd8962-d060-490f-9e73-9ab179402f14
Expand Down Expand Up @@ -90,7 +93,7 @@ yt-fts search "rea* kni* Mali*" --channel "The Tim Dillon Show"
```


# Semantic Search
# Semantic Search and RAG
You can enable semantic search for a channel by using the `get-embeddings` command.
This requires an OpenAI API key set in the environment variable `OPENAI_API_KEY`, or
you can pass the key with the `--openai-api-key` flag.
Expand All @@ -106,11 +109,12 @@ Fetches OpenAI embeddings for specified channel
yt-fts embeddings --channel "3Blue1Brown"

# specify time interval in seconds to split text by default is 10
# the larger the interval the more accurate the llm response
# but semantic search will have more text for you to read.
yt-fts embeddings --interval 60 --channel "3Blue1Brown"
```

After the embeddings are saved you will see a `(ss)` next to the channel name when you
list channels and you will be able to use the `vsearch` command for that channel.
list channels, and you will be able to use the `vsearch` command for that channel.

## `vsearch` (Semantic Search)
`vsearch` is for "Vector search". This requires that you enable semantic
Expand All @@ -133,11 +137,21 @@ yt-fts vsearch "[search query]" --export --channel "[channel name or id]"

```

## `llm` (Chat Bot)
Starts interactive chat session with `gpt-4o` OpenAI model using
the semantic search results of your initial prompt as the context
to answer questions. If it can't answer your question, it has a
mechanism to update the context by running targeted query based
off the conversation. The channel must have semantic search enabled.

```sh
yt-fts llm --channel "3Blue1Brown" "How does back propagation work?"
```

## How To

**Export search results:**

For both the `search` and `vsearch` commands you can export the results to a csv file with
the `--export` flag. and it will save the results to a csv file in the current directory.
```bash
Expand All @@ -163,7 +177,8 @@ yt-fts update --channel "3Blue1Brown"


**Export all of a channel's transcript:**
This command will create a directory in current working directory with the youtube

This command will create a directory in current working directory with the YouTube
channel id of the specified channel.
```bash
# Export to vtt
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"

[project]
name = "yt-fts"
version = "0.1.51"
version = "0.1.52"
description = "Search all of a YouTube channel from the command line"
readme = "README.md"
requires-python = ">=3.8"
Expand Down
2 changes: 1 addition & 1 deletion yt_fts/yt_fts.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
show_message
)

YT_FTS_VERSION = "0.1.51"
YT_FTS_VERSION = "0.1.52"
console = Console()


Expand Down

0 comments on commit a5267a8

Please sign in to comment.