Skip to content

A Python web scraper to scrape internships from Internshala , with the option to integrate Google Sheets and sync it with GitHub Actions

Notifications You must be signed in to change notification settings

rohit1kumar/internsync

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

InternSync

A Playwright based web scraper to scrape internships from Internshala, written in Python. Data is stored in a CSV file.

Disclaimer

This is for educational purposes only. I am not responsible for any misuse of this code.

Features

  • Store data in a CSV file
  • Store data in a Google Sheet (optional)
  • Keeps the Google Sheet data synced using GitHub Actions (optional)

Prerequisites

Make sure you have the following dependencies installed:

you can install them using the following commands too:

pip install playwright && playwright install chromium

Usage

  1. Clone the repository
  2. Install the dependencies using pip install -r requirements.txt
  3. Run the script using python main.py

Optional steps (for Google Sheets mode only):

  1. Create a new Google Sheet
  2. Create a new project in Google Cloud Platform
  3. Follow this guide for setting up the Google Sheets API
  4. Download the JSON file and add all the credentials to the .env file (refer to .env.example)
  5. Get the Google Sheet ID from the URL e.g https://docs.google.com/spreadsheets/d/GOOGLE_SHEET_ID/edit
  6. Add to GOOGLE_SHEET_ID to the .env file

Optional steps (for syncing Google Sheets using GitHub Actions):

  1. GitHub Actions are already setup in the repository
  2. Download GitHub CLI or add secrets manually to the repository from .env file
  3. With GitHub CLI run gh secret set -R <your-username/your-repo> -f .env

Options

  • --headful: Run the script in non-headless mode (show the browser)
    python main.py --headful
  • --gs: Run the script in Google Sheets mode (store data in Google Sheets)
    python main.py --gs

About

A Python web scraper to scrape internships from Internshala , with the option to integrate Google Sheets and sync it with GitHub Actions

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages