Skip to content

A collection of useful data scraping tools scripted in Python

Notifications You must be signed in to change notification settings

VidiHawk/web-scraping-tools

Repository files navigation

Web Scraping Tools

This is intended to grow into a library of useful data scraping tools scripted in Python.

Index

Images

Video and audio

These scripts are mainly based on the ffmpeg and youtube-dl libraries. Please note that youtube-dl can be very slow for downloading (~50kb/s). The new yt-dlp library is a fork of youtube-dl and features improved performance (~5Mip/s downloads) and additionnal tools. More on these libraries in the links below.

Prerequisites

Requirements

  • git
    • You'll know you did it right if you can run git --version and you see a response like git version x.x.x

Setup

Clone this repo

git clone https://github.com/VidiHawk/web-scraping-tools

cd <your project's file>

Then install dependencies

pip install -r requirements.txt

Adding your own tools

If you want to add packages to the requirement.txt file, I recommand using the pipreqs package. To install it:

pip install pipreqs

To build automatically your requirements.txt, just run the following command in the project directory:

pipreqs . --force

The --force flag will overwrite the existing requirements.txt file.

Notes

These scripts have been created and tested on the Ubuntu 20.04.4 LTS operating system and Python 3.8.10

Acknoledgements

About

A collection of useful data scraping tools scripted in Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages