Skip to content

Latest commit

 

History

History
120 lines (72 loc) · 3.54 KB

README.md

File metadata and controls

120 lines (72 loc) · 3.54 KB

Album scraper

Modularized python script for webscraping images.

You can watch the changelog at CHANGELOG, it's an overall view, not greatly detailed.

Contents

  1. Description
  2. Motivation
  3. History
  4. How does it work?
    1. Requirements
    2. Installation
    3. Create a module
    4. Executing a module
  5. What pages does it work on?
  6. Legal notice

Description

Back to the contents

Scrape the images from a website into an album formatting (file wise).

It's not perfect, it's straight to the point. It (somewhat) complies with Pylint, where useful

Motivation

Back to the contents

It's really annoying to download and organize those images, for whatever that reason may be, and doing it all that work manually, too much mental overhead, what if you skipped one image? how do you check it? they're unsorted, name may be the default (filename on the host), will you manually rename them? Jow many will you miss while doing so? And many more...

Why act like a robot when a robot can do it for you?

History

Back to the contents

Started more than a year ago, at the time the project started, 2022, and I applied little modifications here and there, iterating, and I finally decided to try and encapsulate it for a modular usage.

It started as an assignment, from which I was reminded of the existence of webscraping as a technique, and an opportunity to automate one routine I had arose. I expanded upon it until I've decided to publish it on open-source it.

How does it work?

Back to the contents

Requirements

Back to the contents

  • Python 3.8.4 or higher
    • not tested on lower, but it should work on >= 3.6.x
  • Some decent internet speed, It worked nicely on 20 MB/s

Libraries:

  • Pip

Or manually install the following:

Look at the requirements.txt file.

Installation

Back to the contents

python -m pip install
# or pip3 install if you're on linux/unix systems
pip install -r requirements.txt # pip3 for any unix system

since python will be installed by default pip3

Create a module

Back to the contents

Folow the example.

Executing a module

Back to the contents

To execute it, simply create an __init__.py at the root for the modules folder. And "export" the desired init function to execute, the __main__.py will handle the rest:

from .example import init

And run the project, at the root:

python . # python3 for unix/linux
# or run the project folder
python album-scraper

Or watch out for the CLI, which might make it easier, and with more options

What pages does it work on?

Back to the contents

Those without pagination, and with all the links you want to scrape visible at the homepage

Pages example Page layouts, Image designed with excalidraw

Legal notice

Back to the contents

I am not endorsing any illegal activities, images still hold their licensing and ownership by it's rightful author(s). If the robots.txt does not allow the officially legal webscraping of the website, any unrightful, mischievous or illegal act will still be illegal, and not my responsibility.

Use this scripts at your own responsibility.