A powerful desktop application for extracting and analyzing content from web URLs. Built with Python and Tkinter, this tool provides a user-friendly interface for processing multiple URLs simultaneously, extracting key information, and saving results locally.
If you find this tool useful, you can consider supporting its development by buying me a coffee. This will help me continue to improve and maintain the tool. Your support is greatly appreciated!
- π URL Processing: Process multiple URLs simultaneously with a queue-based system
- π€ Content Analysis: Extract titles, keywords, summaries, and generate relevant hashtags
- π Progress Tracking: Real-time status updates and progress monitoring for each task
- πΎ Auto-save: Automatically save processed content to local files
- βοΈ Task Management: Pause, restart, or review completed tasks
- ποΈ Configurable Settings: Customize save directory and auto-save preferences
- Python 3.7 or higher
- tkinter (usually comes with Python)
- Ollama for running the LLaMA model
- Install Ollama based on your operating system:
curl https://ollama.ai/install.sh | sh
brew install ollama
Download and run the installer from Ollama's official website
- Start the Ollama service:
ollama serve
- Pull the LLaMA model:
ollama pull llama2
- Verify the installation:
ollama list
You should see llama2
in the list of available models.
- Configure the application to use Ollama:
- The application is pre-configured to use "llama3.2" as the model name
- Update the model name in
ContentExtractor
initialization if using a different model version
- Clone the repository:
git clone <repository-url>
cd content-extractor
- Install required dependencies:
pip install -r requirements.txt
- Ensure Ollama is running:
- Start Ollama service if not already running
- Verify the LLaMA model is available
- Start the application:
python content_extractor_gui.py
-
Configure Settings:
- Click "Browse" to set your preferred project directory
- Toggle auto-save option as needed
-
Process URLs:
- Enter a URL in the input field
- Click "Add URL" to start processing
- Monitor progress in the tasks list
- View results by clicking on completed tasks
-
Managing Tasks:
- Click on any task to view its details
- Use the restart button (β») to reprocess failed or completed tasks
- Save results manually using the "Save Result" button if auto-save is disabled
- Queued: Task is waiting to be processed
- Processing: Currently extracting content
- Completed: Successfully processed
- Error: Failed to process (can be restarted)
Results are saved as JSON files with the following structure:
{
"title": "Article Title",
"keywords": [
"keyword1",
"keyword2",
...
],
"content_summary": "Brief summary of the content",
"hashtags": [
"#hashtag1",
"#hashtag2",
...
],
"full_article": "Complete article text"
}
The application stores its configuration in content_extractor_config.json
:
{
"project_dir": "/path/to/save/directory",
"auto_save": true
}
-
URLTask: Manages individual URL processing tasks
- Tracks status, progress, and results
- Handles timing and error states
-
TaskPanel: UI component for displaying task information
- Real-time status updates
- Progress bar
- Duration tracking
- Save path display
-
ContentExtractorGUI: Main application interface
- Manages task queue and threading
- Handles file I/O and configuration
- Provides user interface controls
- Uses a queue-based system for task management
- Processes multiple URLs concurrently
- Maintains UI responsiveness with proper thread management
- Limits concurrent processing to prevent resource exhaustion
The application includes comprehensive error handling for:
- Invalid URLs
- Network issues
- Processing failures
- File system operations
- Configuration management
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
MIT
For issues and feature requests, please:
- Check existing issues in the repository
- Create a new issue with detailed information
- Include steps to reproduce any bugs
- Built using Python and Tkinter
- Uses LLaMA model for content analysis through Ollama