-
Clone the repository:
git clone https://github.com/MilenPlamenov/django_etl.git cd django_etl
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Apply Django migrations:
python manage.py makemigrations python manage.py migrate
-
Run the Django development server:
python manage.py runserver
-
Run the React development server:
cd frontend npm run dev
-
To run the Scrapy spiders and populate the database:
cd articles_scraper scrapy crawl restofworld scrapy crawl capitalbrief
GET /api/articles/
- Get all articles.GET /api/article/<id>/
- Get a single article by ID.POST /api/article/create/
- Create a new article.PUT /api/article/update/<id>/
- Update an article.DELETE /api/article/delete/<id>/
- Delete an article.
- Add custom templates for the rest API
- Implement ArticleSpider class so CapitalBriefSpider and RestOfWorldSpider can inherit from it
- Improve the scraping to be able to scrape more
- Add NER (spacy lib) for the entities crawl
- Improve the project structure
- code refactor in the scraper pipelines.py