hn-archivevault is a Python-based project aimed at archiving stories and comments from Hacker News. This project uses sqlite3 for database management and a custom library hnconnector to interact with the Hacker News API. It is designed to allow for both initial data capture and periodic updates to the archive.
This github project is designed solely for the purpose of coding practice and exploration of data from Hacker News. It allows users to archive stories and comments for personal use or development purposes. Please note, this project is code-based and does not intend to publish the archived database publicly. It serves as a foundation for creating new UIs or for personal exploration of Hacker News data trends over time.
git clone https://github.com//hn-archivevault.git
cd hn-archivevault
pip install -r requirements.txt
To initialize the database, run:
python initial_db_setup.py
This will creates the sqlite3 file hn_archive.db and two tables in it: stories and comments.
To start the archiving process for the first time, run:
python crawler.py --first_run
For subsequent updates to the archive just omit flag:
python crawler.py
In an era where many social media platforms have closed off or discontinued their public APIs, I extend my sincere appreciation to Hacker News for maintaining open access to their data. This openness is invaluable for developers, researchers, and enthusiasts who wish to create tools, 3rd party application and UIs, conduct analyses, or simply explore data in innovative ways. I believe Open data policies significantly contribute to the richness of the internet ecosystem, fostering creativity, transparency, and community engagement.
This project is licensed under the MIT License - see the LICENSE file for details