Summarize text using ChatGPT or a local LLM, with support for multiple large text files, PDF files and translation.
Area | Feature |
---|---|
File types | - Summarize text, markdown, HTML, PDF files |
Summarization levels | - Summarize at different lavels: short, long, and per-paragraph |
Translation | - Translate to a target language |
Data sources | - Batch summarize whole directories of files - Download a file via URL and summarize it |
Private LLM | - Optionally use a locally hosted LLM, for maximum privacy and prevent any loss of IP (Intellectual Property) |
Cost savings | - Avoid re-summarizing a previously processed file - Calculate cost estimates (when using Open AI) |
Output files | - Output files in YAML format (as opposed to JSON): cheaper for LLM to generate, easy for humans to read - Output files with a “.yaml.txt” file extension, for easy previewing and search in storage tools like Dropbox or SharePoint or Google Drive |
- Python3
If running a local LLM:
- ctransformers
If using Open AI Chat GPT:
- Chat GTP 3.5 Turbo [requires an Open AI API key]
To see the available options:
./go.sh
or
python3 main_cli.py
Output:
Usage: main_cli.py <path to input file or input directory or URL> [options]
The options are:
[-l --language - The target output language. The default is set in config.py]
[-o --output - The output directory. By default is None, so output is to stdout (no files are output).]
[-h --help]
- Copy the text you want to summarize, into a file like
data/input.txt
.
Tip: unless using a local LLM, make sure the text does not contain commercially or personally sensitive information!
- Run the
go.sh
script:
./go.sh data/input.txt [options]
To summarize different file(s):
python3 main_cli.py <path to input text file or directory> [options]
gpt-summarizer can also summarize PDF files:
python3 main_cli.py <path to PDF file or directory> [options]
The output is printed to STDOUT (terminal output):
=== === === [1] Summarizing './data/input.txt' === === ===
Summarizing file at './data/input.txt' into English...
=== === === [2] Short Summary = Chunk 1 of 1 === === ===
The study examines how language models perform with long contexts, finding that they struggle when relevant information is in the middle of the input. Performance decreases as context length increases, even for models designed for long contexts, offering insights for future model evaluation.
=== === === [3] FULL Short Summary === === ===
The study examines how language models perform with long contexts, finding that they struggle when relevant information is in the middle of the input. Performance decreases as context length increases, even for models designed for long contexts, offering insights for future model evaluation.
=== === === [4] FULL Long Summary === === ===
The research delves into the performance of language models when processing long contexts, revealing that models face challenges when relevant information is located in the middle of the input. As the context length grows, performance diminishes, impacting tasks like multi-document question answering and key-value retrieval. This study sheds light on how language models utilize input contexts and proposes new evaluation methods for forthcoming long-context models.
=== === === [5] FULL paragraphs Summary === === ===
Recent language models can handle long contexts but struggle with utilizing longer context effectively.
Performance is highest when relevant information is at the beginning or end of the input context.
Models face significant degradation when required to access relevant information in the middle of long contexts.
Performance decreases as the input context length increases, even for models explicitly designed for long contexts.
The study offers insights into how language models utilize input context and suggests new evaluation protocols for future long-context models.
-- THIS FILE time: 0:00:05s
-- THIS FILE estimated cost: $0.0006715
=== === === [6] Completed === === ===
1 files processed in 0:00:05s
-- Total estimated cost: $0.0006715
See also an example of summarizing this README.
Large files are broken into chunks for processing, with a single concatenated final output.
Costs are estimated using the figures in config.py
.
If an output directory is specified as an option, then each input file has an equivalent output file, in YAML format.
gpt-summary can be used in 2 ways:
1 - via remote LLM on Open-AI (Chat GPT) 2 - OR via local LLM (see the model types supported by ctransformers).
First, edit config.py according to whether you can use GPU acceleration:
- If you have an NVidia graphics card and have also installed CUDA, then set IS_GPU_ENABLED to be True.
- Otherwise, set it to be False
- Install openai Python client.
pip install cornsnake==0.0.60 html2text==2024.2.26 json5==0.9.25 ollama==0.2.0 openai==1.23.6 PyMuPDF==1.24.1 pyyaml==6.0.1 ruff==0.3.5
-
Get an Open AI key
-
Set environment variable with your Open AI key:
export OPENAI_API_KEY="xxx"
Add that to your shell initializing script (~/.zprofile
or similar)
Load in current terminal:
source ~/.zprofile
- Set config.py to use open-ai
Set the value of LOCAL_CTRANSFORMERS_MODEL_FILE_PATH
to be "".
Set the value of OLLAMA_MODEL_NAME
to be "".
- Install ollama
- see ollama site
- Pull a compatible model - for example llama3 or phi3 [depending on your hardware]
ollama pull llama3
- Run ollama
ollama serve
- Configure gpt-summarizer to use ollama
Edit config.py - set OLLAMA_MODEL_NAME
to the name of the model from step 2
Set the value of LOCAL_CTRANSFORMERS_MODEL_FILE_PATH
to be "".
- Install python libraries
pip install cornsnake==0.0.60 html2text==2024.2.26 json5==0.9.25 ollama==0.2.0 PyMuPDF==1.24.1 pyyaml==6.0.1 ruff==0.3.5
Tip: if you find there are many retries when parsing the LLM output, then try switching between JSON and YAML.
YAML is generally cheaper and faster, but some LLMs may be more reliable if asked to output JSON.
For a local LLM, you can decide which format to use via config.py
:
- edit the value of
is_local__json_not_yaml
- Install the ctransformers Python library
pip3 install --upgrade ctransformers pymupdf
- Download a compatible model. To know what model types are supported, see the ctransformers project.
Quality models are available at hugging face - see TheBloke.
Example: https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin
OR via bash:
./download-llama-2-13B-model.sh
- Edit config.py
Set LOCAL_CTRANSFORMERS_MODEL_FILE_PATH
to the path to the model file.
Set OLLAMA_MODEL_NAME
to be "".
- see GPU README
- tactic 1: use delimiters, to denote what is the 'data input'
- tactic 2: ask for structured input (as opposed to journalist style or casual informal style)
- tactic 3: ask to check whether conditions are satisfied.
- tactic 4: few-shot prompting -> give successful examples of completing tasks, then ask the model to perform the task.
- tactic 1: Instuct the model to spend more time! (outline specific steps to take)
- tactic 2: Instruct the model to work out its own solution before rushing to a conclusion
- analyze errors, try to improve the prompt
- try include context (if too big, can use summaries?)
- Summarizing
- Inferring
- Transforming (language, tone, format)
- Expanding
- does not know the boundary of its knowledge -> if asked on topic it has little knowledge of, it makes plausible but false statements -> hallucinations!
mitigations:
- asking the model to include a warning if it is not sure
- ask to use relevant information
- (ask to provide links to the source of the information)
Inspired by an excellent DeepLearning.ai course: Prompt Engineering for Developers