Skip to content

Latest commit

 

History

History
155 lines (121 loc) · 5.67 KB

README.md

File metadata and controls

155 lines (121 loc) · 5.67 KB

Bulldozer

Bulldozer is a script designed to automate the process of downloading, organizing, analyzing, and creating torrents for podcasts. It's highly customizable, as pretty much everything you might be interested in changing is defined in the configuration file.

Features

  • Download podcast episodes using RSS feeds
  • Check for duplicate episodes using tracker API
  • Organize and analyze downloaded files
  • Generate reports based on the downloaded content
  • Data fetching from the Podchaser and Podcastindex API
  • Data fetching from Podnews
  • Automatic RSS censoring for matching premium sources
  • Optional local database with metadata for improved flexibility
  • Option to split active podcasts on current year (database required)
  • Partial download of feed using --match-titles
  • Torrent file creation with piece size calculation

Requirements

  • Python 3.12.0+
  • Required Python packages (listed in requirements.txt)
  • mktorrent
  • podcast-dl 10.3.1+

Installation

  1. Clone the repository:

    git clone git@github.com:lewler/bulldozer.git
    cd bulldozer
  2. Install the required Python packages:

    pip install -r requirements.txt
  3. Install additional dependencies:

    sudo apt-get install libwebp-dev libavif-dev
  4. Create your own config file, and add the things you need to override:

    touch config.yaml
  5. If you want to use the Podchaser API you will need a token, which is free up to 25k points per month.

Configuration

Edit the config.yaml file to set up your preferences and API keys. The configuration file includes pretty much all settings that are needed to customize the behavior of the script. The settings most users need to change are at the top of the configuration file. The file has comments, and it's hopefully easy enough to understand what everything does.

Note that you do not need to copy the entire file, and you do not need to add values that you don't need to change. This approach means less work when new things are added to config.default.yaml.

Upgrading

Upgrading should be fairly simple, but if you're jumping versions it might get messy. In that case, do a fresh install and copy your settings over. To upgrade do the following:

  1. Update the codebase

    git pull
  2. Make sure requirements are up-to-date

     pip install -r requirements.txt --upgrade
  3. Run the config checker to see if your config is outdated

    python bulldozer --check-config

    The config checker will let you know if there are settings in your config that are outdated (ie, the don't exist in the default config).

Usage

Command Line Interface

Run the script using the command line interface:

python bulldozer <input>

<input>: RSS feed URL, directory path, local RSS file path, or name to dupecheck.

Note that if your on Linux, you should be able to run the script in this way:

chmod +x bulldozer
./bulldozer <input>

Options

  • --censor-rss: Make sure the RSS feed is censored.
  • --report-only: Only check the files.
  • --download-only: Only downloads the files.
  • --refresh: Don't use the data in the database.
  • --check-files: Only check the files.
  • --dupecheck: Search the API for .
  • --make-torrent: Only create a torrent file.
  • --check-config: Check if user config is valid.
  • --log-level: Set the logging level (e.g., DEBUG, INFO, WARNING, ERROR, CRITICAL).
  • --search-term: Use as search term instead of podcast name.
  • --name: Use as the podcast name.
  • --match-titles: Will only keep episodes matching in the feed.

Project Structure

  • bulldozer: Main script
  • classes/: Contains various classes used in the project.
    • apis/: Contains classes to interact with various apis.
      • podcastindex.py: Interacts with the Podcastindex API
      • podchaser.py: Interacts with the Podchaser API
    • scrapers/: Contains classes to scrape websites.
      • podnews.py: Scrapes data from Podnews.
    • cache.py: Handles the caching.
    • data_formatter.py: Methods for transforming data.
    • database.py: Handles the database logic.
    • dupe_checker.py: Checks for duplicates.
    • file_analyzer.py: Analyzes downloaded files.
    • file_organizer.py: Organizes downloaded files.
    • podcast_image.py: Handles podcast image processing.
    • podcast_metadata.py: Manages podcast metadata.
    • podcast.py: Represents a podcast and its metadata.
    • report_template.py: Templates for generating reports.
    • report.py: Generates reports based on downloaded content.
    • rss.py: Handles RSS feed operations.
    • torrent_creator.py: Creates torrent files.
    • utils.py: Utility functions.
  • logs/: Contains log files.
  • config.example.yaml: Example configuration file.
  • requirements.txt: List of required Python packages.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request for any changes.

Acknowledgements