Problem: open-source maintainers spend a lot of time managing duplicate/related (doppelgänger) issues & pull requests
Solution: doppelgänger compares newly submitted issues/PRs against existing ones to automatically flag duplicate/related (doppelgänger) issues/PRs
Topics: vector db, github, open-source, embedding search, rag, similarity scores
Screen.Recording.2024-04-27.at.4.57.11.PM.mov
This application is a GitHub App that automatically compares newly opened issues with existing ones, closing and commenting on highly similar issues to reduce duplication. In addition, it comments feedback on PRs based on title and description for points to consider.
Each issue['title']
and issue['body']
is converted into vector representation using MiniLM-L6-v2.
Each vector are persisted in ChromaDB and performs similarity search using ChromaDB's built-in cosine similarity search. Along with each vector are issue_id
and issue['title']
stored using ChromaDB's metadata
argument.
SIMILARITY_THRESHOLD
(i.e. distance d
in which we consider "similar") is configurable, and can be set to any decimal between 0 and 1 [1].
Doppelganger will close any issue when the cosine distance d
between the newly submitted issue and the most similar issue is greater than this threshold. Otherwise, if the newly submitted issue is greater than (SIMILARTY_THRESHOLD*0.5), it will leave a helpful comment indicating the most similar/related issue.
Issues and pull requests are stored in ChromaDB collections per repository.
- Python 3.8+
- A GitHub account
- A server or hosting platform to run the app (e.g., Heroku, DigitalOcean, AWS)
- ollama
- Go to your GitHub account settings.
- Click on "Developer settings" in the left sidebar.
- Select "GitHub Apps" and click "New GitHub App".
- Fill in the required information:
- GitHub App name: Choose a unique name (e.g., "Issue Similarity Checker")
- Homepage URL: Your app's website or your GitHub profile
- Webhook URL: The URL where your server will be running (e.g., https://your-server.com/webhook)
- Webhook secret: Generate a secure secret and save it for later use
- Set permissions:
- Repository permissions:
- Issues: Read & write
- Pull requests: Read & Write
- Webhooks: Read-only
- Subscribe to events:
- Issues
- Pull request
- Repository permissions:
- Create the app and note down the App ID
- Generate a private key and download it (you'll need this later)
-
Clone this repository:
git clone https://github.com/dannyl1u/doppelganger.git cd doppelganger
-
Install dependencies:
pip install -r requirements.txt
-
To create a new
.env
file, run the following command in your terminal:
cp .env.example .env
Open the newly created .env
file and update the following variables with your own values:
* APP_ID
: Replace your_app_id_here
with your actual app ID.
* WEBHOOK_SECRET
: Replace your_webhook_secret_here
with your actual webhook secret.
* OLLAMA_MODEL
: Replace your_chosen_llm_model_here
with your chosen LLM model (e.g. "llama3.2"). Note: it must be an Ollama supported model (see: https://ollama.com/library for supported models)
* NGROK_DOMAIN
: Replace your_ngrok_domain_here
with your ngrok domain if you have one
4. Place the downloaded private key in the project root and name it rsa.pem
.
- Run the application locally
Start the Flask application:
python3 app.py
The application will start running on http://localhost:4000
We will use ngrok for its simplicity
Option 1: generated public URL In a new terminal window, start ngrok to create a secure tunnel to your local server:
ngrok http 4000
ngrok will generate a public URL (e.g., https://abc123.ngrok.io)
Append /webhook
to the url, e.g. https://abc123.ngrok.io -> https://abc123.ngrok.io/webhook
In another terminal window, start Ollama
ollama run <an OLLAMA model here>
Option 2: Using Shell Script with your own ngrok domain
Ensure environment variables are all set.
./run-dev.sh
- Go back to your GitHub App settings.
- Update the Webhook URL to point to your deployed application (e.g., https://abc123.ngrok.io/webhook).
- Go to your GitHub App's settings page.
- In the "Install App" section, click "Install App" or "Add Installation".
- Choose the account where you want to install the app.
- Select the repository (or repositories) where you want to use the app.
- Confirm the installation.
Once installed, the app will automatically:
- Monitor new issues and PRs in the selected repositories.
- Compare new issues and PRs with existing ones using semantic similarity.
- Close and comment on highly similar issues and PRs to reduce duplication.
You can adjust the similarity threshold by modifying the SIMILARITY_THRESHOLD
variable in the script. The default is set to 0.5.
- Check the server logs for any error messages.
- Ensure all environment variables are correctly set.
- Verify that the
rsa.pem
file is present and correctly formatted.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License.