Python Script to Scrape GKToday Website to create monthly magazines and Quiz PDFs. Monthly magazines & Quiz are created in both .docx and .pdf format.
Following are the screensnaps of the script and telegram bot. You can see more screensnaps here.
I have tested this script on Windows & Linux. Although I have only setup the Telegram Bot on Linux. You can follow the Guide till Step 2 on Windows. Following step 2 is the guide for setting up the Telegram bot, which is only for Linux.
>> git clone ~/
If you are on windows, you need to set add it to the PATH.
Create a directory named gktoday in your home directory.
>> mkdir ~/gktoday
>> cd ~/gktoday
Initialize python virtual environment and activate it.
>> python3 -m venv env
>> source ./env/bin/activate
Install the required python libraries.
(env) >> pip install bs4 requests python-docx flask python-telegram-bot
(env) >> python ~/GKTodayScrape/scrape.py
Following section will help you setup your own Telegram Bot to serve the converted PDF Magazines on the Bot.
How to Build Your First Telegram Bot: A Guide for Absolute Beginners
Note: Telegram only works over HTTPS if you want to use webhooks. So you need to get an SSL certificate for this to work. Follow this Guide to get a SSL certificate. (Yes, its Free)
Running Your Flask Application Over HTTPS
>> nohup ~/gktoday/env/bin/python ~/GKTodayScrape/app.py >> ~/gktoday/log/nohup_app.py.log 2>&1 &
This will also log the output of nohup in ~/gktoday/log/nohup_app.py.log
In case something Bad happens, you can check this this log file for errors
Note : This is not necessary, but you might want to add a cronjob to your linux server to preiodically scrape files from GKToday.in.
>> crontab -e
0 */4 * * * ~/gktoday/env/bin/python ~/GKTodayScrape/scrape.py >> ~/gktoday/log/cron_scrape.log 2>&1