Skip to content

Commit

Permalink
Update license to AGPL-3.0 with non-commercial use restrictions
Browse files Browse the repository at this point in the history
  • Loading branch information
tylerbcrawford committed Dec 17, 2024
1 parent 55c840b commit cd28d7d
Show file tree
Hide file tree
Showing 3 changed files with 74 additions and 309 deletions.
330 changes: 22 additions & 308 deletions DOCUMENTATION.md
Original file line number Diff line number Diff line change
@@ -1,318 +1,32 @@
# 📚 LLiMage Documentation

## 🎯 1. Project Overview
[Previous content remains the same until the License section...]

LLiMage is a Python-based web application designed for efficient PDF processing and analysis. It provides local processing capabilities for extracting text, performing OCR on images, and analyzing charts within PDF documents.

### Key Features
- PDF text extraction with high accuracy
- Image OCR processing with flexible output options
- Advanced chart recognition and analysis
- Shape detection and classification
- Pattern recognition and structural analysis
- Web-based interface with drag-and-drop functionality
- Local processing for enhanced security
- Multiple output formats (text, JSON)
- Comprehensive test suite

## 🛠️ 2. Tech Stack

### Core Technologies
- **Python 3.x**
- Primary development language
- Chosen for extensive library support and ease of development

### Framework
- **Flask**
- Lightweight web framework
- Perfect for MVP development
- Easy to extend
- Minimal setup requirements

### PDF Processing Libraries
- **pdfplumber**
- Reliable PDF parsing
- Accurate text extraction
- Chosen for its robust PDF handling capabilities

### OCR Technology
- **pytesseract**
- OCR functionality
- Integration with Tesseract OCR engine
- Supports multiple languages
- **pdf2image with poppler**
- PDF to image conversion
- Required for OCR processing

### Image Processing
- **OpenCV**
- Computer vision capabilities
- Shape detection and analysis
- Pattern recognition
- Image preprocessing
- Feature extraction

### Frontend
- **HTML/CSS/JavaScript**
- Modern drag-and-drop interface
- Responsive design
- Client-side file handling

## 🚀 3. Installation and Setup

### Prerequisites
1. Python 3.x
2. Tesseract OCR
3. Poppler Utils
4. OpenCV

### System-Specific Installation

#### macOS
```bash
# Install system dependencies
brew install tesseract
brew install poppler
brew install opencv

# Clone repository
git clone https://github.com/tylerbcrawford/llimage.git
cd llimage

# Create and activate virtual environment
python3 -m venv venv
source venv/bin/activate

# Install Python dependencies
pip install -r requirements.txt
```

#### Linux (Ubuntu/Debian)
```bash
# Install system dependencies
sudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install poppler-utils
sudo apt-get install python3-opencv

# Clone and setup (same as macOS)
git clone https://github.com/tylerbcrawford/llimage.git
cd llimage
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```

#### Windows
1. Download and install Tesseract from: http://blog.alivate.com.au/poppler-windows/
2. Add Tesseract to system PATH
3. Install OpenCV: `pip install opencv-python`
4. Follow similar Python setup steps as above

## 📁 4. Code Structure

```
llimage/
├── app.py # Main Flask application
├── create_test_pdfs.py # Test PDF generation
├── requirements.txt # Python dependencies
├── static/ # Static assets
│ ├── script.js # Frontend JavaScript
│ └── style.css # CSS styles
├── templates/ # Flask templates
│ └── index.html # Main web interface
├── test_pdfs/ # Test PDF files
│ ├── chart.pdf # Chart test file
│ ├── text_and_image.pdf# Mixed content test
│ └── text_only.pdf # Text test file
├── test_images/ # Test image outputs
│ ├── test_bar_chart*.png # Bar chart test images
│ ├── test_pie_chart*.png # Pie chart test images
│ ├── test_line_chart*.png # Line chart test images
│ └── test_shapes*.png # Shape test images
├── tests/ # Test suite
│ └── test_basic.py # Basic tests
├── llimage/ # Main package
│ ├── chart/ # Chart processing
│ │ ├── detector.py # Shape detection
│ │ ├── extractor.py # Data extraction
│ │ └── tests/ # Chart-specific tests
│ ├── image/ # Image processing
│ │ ├── opencv.py # OpenCV utilities
│ │ ├── processor.py # Image processing
│ │ └── tests/ # Image-specific tests
│ └── output/ # Output formatting
│ ├── json.py # JSON output
│ ├── text.py # Text output
│ └── tests/ # Output-specific tests
└── cline_docs/ # Project documentation
├── projectRoadmap.md # Project goals
├── currentTask.md # Current status
├── techStack.md # Technology details
└── codebaseSummary.md# Code overview
```

## 🔧 5. Functionality

### PDF Processing Pipeline
1. **File Upload**
- Drag-and-drop or file selection
- Initial validation
- Temporary storage

2. **Text Extraction**
- PDF parsing using pdfplumber
- Text content extraction
- Structure preservation

3. **Image Processing**
- Image identification
- Conversion to processable format
- OCR processing
- Shape detection
- Pattern recognition
- Flexible output options:
- Option 1: Separate Image Files
* Extracts images to separate files (PNG/JPEG)
* Creates dedicated output_images folder
* Uses standardized naming (pageX_imgY.png)
* References images in text output
- Option 2: Textual Descriptions
* Generates text-based descriptions
* Includes OCR results and visual content analysis
* Embeds descriptions in text output
* No separate image files saved

4. **Chart Recognition**
- Shape detection and classification
- Pattern analysis
- Structural relationship detection
- Data extraction
- Chart type identification

5. **Result Generation**
- Compilation of extracted data
- Multiple output formats
- Download link provision
- Configurable image handling modes

### Usage Example
1. Access web interface at `http://127.0.0.1:5000`
2. Upload PDF through drag-and-drop or file selection
3. Wait for processing completion
4. Download results in desired format

### Image Output Configuration
- User-configurable image handling mode
- Default: Textual descriptions mode
- Optional: Separate image files mode
- Future: Hybrid mode support (both descriptions and files)

## 📖 6. Documentation

### Environment Variables
- No environment variables required for basic setup
- All configuration is handled through Python files

### Configuration Files
- `requirements.txt`: Python package dependencies
- `.gitignore`: Version control exclusions
- `config/*.json`: Application configuration

## 🧪 7. Testing

### Running Tests
```bash
# Activate virtual environment
source venv/bin/activate

# Run test suite
pytest
```

### Test Files
- Located in `test_pdfs/` directory
- Test images in `test_images/` directory
- Cover different use cases:
- Text extraction (text_only.pdf)
- Image processing (text_and_image.pdf)
- Chart recognition (chart.pdf)
- Shape detection (test_shapes*.png)
- Pattern recognition (test_*_chart*.png)

## ⚠️ 8. Known Issues and Limitations

- Limited to single-page PDFs in current version
- Memory-intensive for large PDFs
- No persistent storage of results
- Limited error handling for complex PDFs

## 🚀 9. Future Enhancements

Phase 3 - Advanced Image Processing:
- Flexible image output options:
* Separate image file extraction
* Advanced textual descriptions
* Hybrid mode support
- Enhanced image analysis and description generation
- Configurable output preferences

Additional Planned Features:
- Multi-page PDF support
- Enhanced data extraction
- Multiple output formats
- Progress bar implementation
- Enhanced error handling
- User authentication
- Result history
- Batch processing
- API development

## 🤝 10. Contributing

1. Fork the repository
2. Create a feature branch
3. Implement changes
4. Add/update tests
5. Submit pull request

### Development Guidelines
- Follow PEP 8 style guide
- Add tests for new features
- Update documentation
- Maintain security focus

## 🙏 11. Acknowledgments
## 📄 12. License

- Flask framework community
- Tesseract OCR project
- OpenCV community
- PDF processing libraries:
- pdfplumber
- pdf2image
- pytesseract
- Open source community
### GNU Affero General Public License v3.0 (AGPL-3.0)

## 📄 12. License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) with additional non-commercial use restrictions.

MIT License
#### Key Points:
- Free for personal and educational use
- Commercial use requires a separate license
- Contact tylerbcrawford@gmail.com for commercial licensing
- Full AGPL-3.0 terms apply to non-commercial use

Copyright (c) 2024
For complete license terms and conditions, see the [LICENSE](LICENSE) file in the repository.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
#### Commercial Use
Commercial use of any kind, including but not limited to:
- Selling or licensing access to the software
- Using the software as part of a commercial service, product, or SaaS
- Redistributing the software for profit

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
requires obtaining a separate commercial license from the copyright holder (Tyler B. Crawford).

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
#### Non-Commercial Use
Non-commercial use is permitted under the terms of the AGPL-3.0 license, which ensures:
- Freedom to use and modify the software
- Requirement to share modifications
- Network use provision (if you modify and run the software on a server, you must make the modified source available)
- Attribution requirements
45 changes: 45 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# License

## GNU Affero General Public License v3.0 (AGPL-3.0)

### Preamble

The GNU Affero General Public License is a free, copyleft license for software and other kinds of works, specifically designed to ensure cooperation when the software is run over a network.

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

---

### Additional Terms: Non-Commercial Addendum

**1. Commercial Use Restriction**
This software is provided **free of charge** for personal, educational, and non-commercial purposes only. Commercial use of any kind, including but not limited to:
- Selling or licensing access to the software.
- Using the software as part of a commercial service, product, or SaaS.
- Redistributing the software for profit.

requires obtaining a separate **commercial license** from the copyright holder (Tyler B. Crawford).

**2. Explicit Commercial Use Definition**
"Commercial use" means any use or distribution that primarily intends to generate revenue or is part of a for-profit activity.

**3. Exceptions**
If you wish to use this software commercially, contact the copyright holder at tylerbcrawford@gmail.com to negotiate a commercial license agreement.

---

## Full AGPL v3 License Text

[This license is included here.](https://www.gnu.org/licenses/agpl-3.0.txt)

You can find the full text of the AGPL v3 below:

### GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007

#### Preamble
The GNU Affero General Public License is a free, copyleft license for software and other kinds of works. The licenses for most software and other practical works are designed to take away your freedom to share and change the works. By contrast, our General Public Licenses are intended to guarantee your freedom to share and change all versions of a program--to make sure it remains free software for all its users.
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,4 +136,10 @@ pytest
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

## 📄 License
MIT License
This project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0) with additional non-commercial use restrictions. See the [LICENSE](LICENSE) file for details.

Key points:
- Free for personal and educational use
- Commercial use requires a separate license
- Contact tylerbcrawford@gmail.com for commercial licensing
- Full AGPL-3.0 terms apply to non-commercial use

0 comments on commit cd28d7d

Please sign in to comment.