This project is a Node.js application that uses advanced AI and natural language processing techniques to analyze and answer questions about the provided resources and related information. It combines document parsing, vector storage, and an AI agent to provide intelligent responses.
demo.mov
- Node.js: The runtime environment for executing the TypeScript code.
- TypeScript: The programming language used for the project.
- LlamaIndex: A data framework for LLM-based applications.
- OpenAI: For accessing GPT-4 language model.
- dotenv: For loading environment variables.
- fs/promises: For asynchronous file system operations.
- OpenAI API: Used for natural language processing tasks.
- Qdrant: Vector database for efficient similarity search.
- HuggingFace: For embedding model.
- Document Parsing: Uses LlamaParseReader to parse PDF documents into markdown format.
- Vector Storage: Utilizes QdrantVectorStore for efficient storage and retrieval of document embeddings.
- Embedding: Employs HuggingFaceEmbedding with the "BAAI/bge-small-en-v1.5" model for text embedding.
- Query Engine: Implements a query engine for retrieving relevant information from the parsed documents.
- AI Agent: Uses OpenAIAgent with custom tools for answering queries and performing calculations.
- Caching: Implements a simple caching mechanism to avoid re-parsing previously processed documents.
-
Clone the repository.
-
Install dependencies with
npm install
. -
Install
tsx
globally (if not already installed):npm install -g tsx
Note: Installing tsx globally allows you to run TypeScript files directly without compiling them first.
-
Create a
.env
file with the following variables:OPENAI_API_KEY
: Your OpenAI API key
-
Set up Qdrant:
- Option 1: Ensure you have Qdrant running locally on port 6333.
- Option 2: Run Qdrant in Docker (recommended for easy setup):
To run Qdrant in Docker, follow these steps:
- Make sure you have Docker installed on your system.
- Open a terminal and run the following command:
docker run -p 6333:6333 -p 6334:6334 \
-v $(pwd)/qdrant_storage:/qdrant/storage:z \
qdrant/qdrant
Run the script with:
npx tsx index.ts
The project expects input data in PDF format. Place your PDF files in the ./data/
directory. The current setup uses the file ICS_EUR_Ukraine_29AUG2023_PUBLIC.pdf
.
To add new documents for analysis:
- Place the PDF file in the
./data/
directory. - Add the file path to the
filesToParse
array in theindex.ts
file.
Example:
const filesToParse = [
"./data/ICS_EUR_Ukraine_29AUG2023_PUBLIC.pdf",
"./data/YourNewDocument.pdf"
];