PandaETL is an open-source, no-code ETL (Extract, Transform, Load) tool designed to extract and parse data from various document types including PDFs, emails, websites, audio files, and more. With an intuitive interface and powerful backend, PandaETL simplifies the process of data extraction and transformation, making it accessible to users without programming skills.
- 📝 No-Code Interface: Easily set up and manage ETL processes without writing a single line of code.
- 📄 Multi-Document Support: Extract data from PDFs, emails, websites, audio files, and more.
- 🔧 Customizable Workflows: Create and customize extraction workflows to fit your specific needs (coming soon).
- 🔗 Extensive Integrations: Integrate with various data sources and destinations (coming soon).
- 💬 Chat with Documents: Chat with your documents to retrieve information and answer questions (coming soon).
- Node.js and npm (or yarn)
- Python 3.x
- Conda
- Poetry (Python package manager)
-
Clone the repository:
git clone https://github.com/yourusername/panda-etl.git cd panda-etl
-
Navigate to the frontend directory:
cd frontend
-
Install dependencies (including Husky):
yarn install
-
Create a
.env
file in the frontend directory with the following:NEXT_PUBLIC_API_URL=http://localhost:3000/api/v1 NEXT_PUBLIC_STORAGE_URL=http://localhost:3000/api/assets
or copy the
.env.example
file to.env
-
Run the development server:
npm run dev # or yarn dev
-
Open http://localhost:3000 with your browser to see the result.
-
Navigate to the backend directory:
cd backend
-
Create and activate a Conda environment:
conda create -n pandaetl python=3.x conda activate pandaetl
-
Install Poetry within the Conda environment:
conda install poetry
-
Install dependencies using Poetry (including pre-commit):
poetry install
-
Set up pre-commit hooks:
poetry run pre-commit install
-
Create an environment file from the example:
cp .env.example .env
-
Apply database migrations:
poetry run make migrate
-
Start the backend server:
poetry run make run
- Navigate to the "Projects" page.
- Click on "New Project".
- Fill in the project details and click "Create".
- Open a project and navigate to the "Processes" tab.
- Click on "New Process".
- Follow the steps to configure your extraction process.
Stay tuned for our upcoming feature that allows you to chat with your documents, making data retrieval even more interactive and intuitive.
We welcome contributions from the community. To contribute:
- Fork the repository.
- Create a new branch for your feature or bugfix.
- Commit your changes and push to your fork.
- Create a pull request with a detailed description of your changes.
This project is licensed under the MIT Expat License. See the LICENSE file for details.
We would like to thank all the contributors and the open-source community for their support.
For any questions or feedback, please open an issue on GitHub.
This project uses pre-commit hooks in the backend and Husky in the frontend to ensure code quality and consistency.
Husky is set up in the frontend to run linting checks before each commit.
To manually run the frontend linting: