A simple tool to scrape a list of keywords from Reddit into a neatly formatted .xlsx file.
Use the package manager pip to install the required libraries.
pip install -r requirements.txt
or
Using pipenv:
pipenv install
If you do not have a Reddit account you must first sign up for one.
- Go to Reddit Apps.
- Select “script” as the type of app.
- Name your app and give it a description.
- Set-up the redirect url to be http://localhost:8080. The redirect URI will be used to get your refresh token.
- Once you click on “create app”, you will get a box showing you your client_id and client_secrets.
- In the folder containing this README file (the main folder for this project)
- Open the .env file and enter the client id and secret like the following and save the file.
client_id = "YourClientIDHere"
client_secret = "YourClientSecretHere"
user_agent = "YourAppNameHere"
To change the list of phrases and keywords, open the keywords.txt file under the Keywords&Lists directory.
Keywords and phrases must be similar like this:
This is an example phrase
KeywordExample
YouGetTheIdea
By default this script will scrape the top 100 posts for each keyword or phrase for the chosen time period. To adjust this you can adjust the limit under the keyword_search function.
def keyword_search(keyword):
for submission in allsubs.search(
keyword, sort="top", syntax="lucene", time_filter=data_time, limit=100):
Doing so may result in hitting the API request limit. The maximum allowed is 1000, which can be achieved by setting limit to "none".
More information on this can be found in the praw api docs.
You can filter out results from specific subreddits by opening the filtered_subreddits.txt file under the Keywords&Lists directory.
List the undesired subreddits like this:
exampleSubreddit
UGetTheIdea
IhOpe