A powerfull social media web crawler/web scrapper that dumps images, tweets, captions, external links and hashtags from Instagram and Twitter in an organized form. It also shows the most relevant hashtags with their frequency of occurrence in the posts.
Download or Clone the repo, Navigate to the directory containing the files and run the following command in cmd.
python setup.py install
After downloading or cloning the repo, Download chrome webdriver or firefox webdriver and extract it in the root folder of project.
For scrapping Twitter, we need to setup a Twitter App. First of all login from your Twitter account and goto Twitter Apps. Create a new app (How to create twitter app), goto Keys and access tokens and copy Consumer Key, Consumer Secret, Access Token and Access Token Secret. We will need them later.
Once you have created a Twitter App and installed the dependencies, you are good to go. Following are the details of the variables used to initialize the scrappers.
Variable | Default | Description |
---|---|---|
tag | Null | The keyword to search |
limit | 20 | Number of posts to scrape |
Consumer_Key | Null | Consumer Key of Twitter App |
Consumer_Secret | Null | Consumer Secret of Twitter App |
Access_Token | Null | Access Token of Twitter App |
Access_Token_Secret | Null | Access Token Secret of Twitter App |
lang | 'en' | Language of tweets to retrieve |
browser | 'chrome' | Either chrome or firefox to use |
- Python 3.x
- Tweepy
- Selenium
- Urllib
- openpyxl
- Fork it
- Create your feature branch: git checkout -b my-new-feature
- Commit your changes: git commit -am 'Add some feature'
- Push to the branch: git push origin my-new-feature
- Submit a pull request
Muhammad Ali Zia
This project is licensed under the MIT License