This is intended to grow into a library of useful data scraping tools scripted in Python.
- Instagram images, comments, and data scraping
- Get photos from Google Maps based on coordinates
- Search and download photos from Google Image
- Convert video files to audio files
- Download videos from YouTube, Twitch, Vimeo, etc.
- Download audio file from Youtube
These scripts are mainly based on the ffmpeg and youtube-dl libraries. Please note that youtube-dl can be very slow for downloading (~50kb/s). The new yt-dlp library is a fork of youtube-dl and features improved performance (~5Mip/s downloads) and additionnal tools. More on these libraries in the links below.
- git
- You'll know you did it right if you can run
git --version
and you see a response likegit version x.x.x
- You'll know you did it right if you can run
Clone this repo
git clone https://github.com/VidiHawk/web-scraping-tools
cd <your project's file>
Then install dependencies
pip install -r requirements.txt
If you want to add packages to the requirement.txt file, I recommand using the pipreqs package. To install it:
pip install pipreqs
To build automatically your requirements.txt, just run the following command in the project directory:
pipreqs . --force
The --force flag will overwrite the existing requirements.txt file.
These scripts have been created and tested on the Ubuntu 20.04.4 LTS operating system and Python 3.8.10