GitHub - amirbnprogramming/Django-Techcrunch-Scrapper: This is a scrapper with Beautifulsoup4 and Celery and Django . You can search and find anything in TechCrunch.com website

Django Techcrunch Scrapper

DjangoTechcrunchScrapper is a Django app to scrape Techcrunch.com website items . Scrapped Data are authors , categories , articles . Application development and testing with django v4.2

Quick start

Install all the packages and requirements with :
```
 pip install -r  requirements.txt
```
Install broker manager like rabbitmq or redis
Set specific and custome settings for you project in settings.py

Set specific and custome settings for you celery in Celery name space in settings.py

 # CELERY-SETCION
 CELERY_BROKER_URL = 'amqp://localhost:your port' (for rabbitmq)
 CELERY_TIMEZONE = 'Your timezone'
 CELERY_TASK_TIME_LIMIT = 60 * 60
 CELERY_RESULT_BACKEND = 'django-db'
 CELERY_TASK_SERIALIZER = 'json'
 CELERY_RESULT_SERIALIZER = 'json'

open terminal and make migrations for models :

 python manage.py makemigrations     
 python manage.py migrate

First of all set the celery beat schedule, go to celery.py and find schedule , change it by second to change schedule:

 app.conf.beat_schedule =
   {
        'every-day-start-daily-scrape': {
            'task': 'techcrunch.tasks.daily_scrape_task',
            'schedule': 86400,  # One day
        },
    }

Before all the things you should be logged in to use specific services , so at first:
```
   py manage.py createsuperuser
```
Then log in with url host:port/admin

After setting celery settings call celery-beat and celery-worker with each other in two cmd terminal:

 celery -A techcrunch_scrapper_with_django worker -l INFO -P eventlet
 celery -A techcrunch_scrapper_with_django beat --loglevel=INFO

Then at last run the django server and run the app :
```
python manage.py runserver
```

Links description :

admin/  => admin panel
manual_daily_search [name='manual_daily_search']  => manual daily scrapping with out celery beat
search_keyword [name='search_keyword']  => search by keyword  page
diagrams/<slug:model_name> [name='diagrams'] => draw diagrams :
diagrams/author => number of articles of each author
diagrams/category => number of articles of each category
diagrams/article => number of articles seach by keyword

The result of diagram generating , will be saved in basedirectory / exports ...

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.idea		.idea
Utils		Utils
static		static
techcrunch		techcrunch
techcrunch_scrapper_with_django		techcrunch_scrapper_with_django
template/techcrunch/search_temp		template/techcrunch/search_temp
.gitignore		.gitignore
README.md		README.md
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quick start

About

Releases

Packages

Languages

amirbnprogramming/Django-Techcrunch-Scrapper

Folders and files

Latest commit

History

Repository files navigation

Quick start

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages