Skip to content

Scraper of tweets on a given subject. (SDIA 2023 project for TPS students)

License

Notifications You must be signed in to change notification settings

cognitivefactory/twitter-scraper

Repository files navigation

Twitter Scraper

Windows Python application Pylint GitHub version Author

  1. ✏️ Setup
  2. 💁 More infos and Usage
  3. 🧪 Testing
  4. 🧑‍🏫 Contributing
  5. ⚖️ License
  6. 🔄 Changelog
  7. 🐛 Bugs & TODO

✏️ Setup

Note This project is currently under development. It is not yet ready for production.

Please install first the required packages with the following command:

pip install --upgrade -r requirements.txt

Then you should setup a Twitter developer account and create a new app to get your API keys. You can find more information here.

Then you should create a new file named .env in the root directory of the project and add the following lines (based on .env.example):

API_KEY =
API_KEY_SECRET =
BEARER_TOKEN =

💁 More infos and Usage

🧪 Testing

Oh god! Please don't... Still, make sure you have pytest installed and run the following command:

pytest .\twitter_scraper\

You can also use the vscode UI to run the tests.

🧑‍🏫 Contributing

If you ever want to contribute, please begin by reading our Contributing Guidelines.

The standard procedure is :

fork -> git branch -> push -> pull request

Note that we won't accept any PR :

  • that does not follow our Contributing Guidelines
  • that is not sufficiently commented or isn't well formated
  • without any proper test suite
  • with a failing or incomplete test suite

Happy coding ! 🙂

⚖️ License

This project is licensed under the CeCILL-C FREE SOFTWARE LICENSE AGREEMENT. For more information, please refer to the official website.

🔄 Changelog

See changelog.md for more information.

gantt
    title Main Versions
    dateFormat YYYY-MM-DD

    section source Code (v0)
    v0.1 : 2023-01-16, 1d
    v0.2 :             2d
    v0.3 :             2d

    section stable Versions
    v1   : 2023-01-19, 9d
Loading
Stable Version 1 (click here to expand)

v1.0 first stable release

  • collection.abc instead of typing (deprecated)
  • lowered the requirements
  • min supported python version is now 3.10.6

v1.1 more queries and less storage

  • encoded tweet.content into bytes for storage
  • added retweet and reply selectors to SearchQuery

🐛 Bugs & TODO

known bugs (final correction patch version) see Issues

  • tweet.date is always None when scraping (stored as 0)

todo (first implementation version)

  • encode tweet.content into bytes for storage
  • should add tweet.date back in when scraping
  • add large search queries
  • a posteriori tweet inspection

About

Scraper of tweets on a given subject. (SDIA 2023 project for TPS students)

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project