PDF Similarity Matcher

The PDF Similarity Matcher is a command-line tool for finding and displaying PDF documents similar to a given input PDF based on extracted text features. It leverages text extraction and similarity comparison to help you identify relevant matches from a directory of PDFs.

Features

Extracts text from PDF files.
Processes and compares features from multiple PDFs.
Calculates similarity scores between an input PDF and PDFs in the directory.
Optionally displays detailed key-value feature information for similar PDFs.

Installation

Follow these steps to install and set up the PDF Similarity Matcher:

    pip install pdfsim

Usage

To find similar PDFs, use the following command:

pdfsim -d <directory_containing_pdf> -i <input_pdf> -t <top_n> [-kv]

Arguments

-d, --database (required): Path to the directory containing PDF files to compare against.
-i, --input (required): Path to the input PDF file you want to compare.
-t, --top (optional, default: 1): Number of top similar PDFs to display.
-kv (optional): Enable detailed key-value feature output for similar PDFs.

Contributing

Follow these steps to setup the project locally

Clone the repository:

git clone https://github.com/yourusername/pdfsim.git
cd pdfsim

Create a virtual environment:
```
python3 -m venv venv
```
Activate the virtual environment:
- On Windows:
```
venv\Scripts\activate
```
- On macOS/Linux:
```
source venv/bin/activate
```
Install the required packages:
```
pip install -r requirements.txt
```
Ensure requirements.txt includes the necessary libraries:
```
PyPDF2
scikit-learn
nltk
```

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Similarity Matcher

Features

Installation

Usage

Arguments

Contributing

About

Releases

Packages

Languages

KrishavRajSingh/pdfsim

Folders and files

Latest commit

History

Repository files navigation

PDF Similarity Matcher

Features

Installation

Usage

Arguments

Contributing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages