Skip to content

ETL Articles project using Django x Scrapy and React JS

Notifications You must be signed in to change notification settings

MilenPlamenov/django_etl

Repository files navigation

Articles ETL Project

django and react

Setup and Installation

  1. Clone the repository:

    git clone https://github.com/MilenPlamenov/django_etl.git
    cd django_etl
  2. Create and activate a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install dependencies:

    pip install -r requirements.txt
  4. Apply Django migrations:

    python manage.py makemigrations
    python manage.py migrate
  5. Run the Django development server:

    python manage.py runserver
  6. Run the React development server:

    cd frontend
    npm run dev
  7. To run the Scrapy spiders and populate the database:

    cd articles_scraper
    scrapy crawl restofworld
    scrapy crawl capitalbrief

API Endpoints

  • GET /api/articles/ - Get all articles.
  • GET /api/article/<id>/ - Get a single article by ID.
  • POST /api/article/create/ - Create a new article.
  • PUT /api/article/update/<id>/ - Update an article.
  • DELETE /api/article/delete/<id>/ - Delete an article.

TODO in the future

  • Add custom templates for the rest API
  • Implement ArticleSpider class so CapitalBriefSpider and RestOfWorldSpider can inherit from it
  • Improve the scraping to be able to scrape more
  • Add NER (spacy lib) for the entities crawl
  • Improve the project structure
  • code refactor in the scraper pipelines.py

About

ETL Articles project using Django x Scrapy and React JS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published