Skip to content

tommasogritti/rank-gazer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rank Gazer: Daily App Store Ranking Scraper

GitHub Workflow Status

This repository contains a Python script and a GitHub Actions workflow that automates the process of scraping data from multiple Apple App Store app pages. The workflow runs daily at a predefined time (09:00 UTC) or can be triggered manually, and it collects information such as:

  • App ranking in the Medical category.
  • App star rating.
  • Total number of ratings.

The scraped data is appended to a CSV file stored in a separate branch (data-branch) to keep the main branch clean.

Features

  • Scrapes data from multiple App Store app pages.
  • Runs automatically every day at 09:00 UTC.
  • Stores data in a CSV file (apps_ranking.csv) in the data-branch.
  • Easily extendable to scrape additional apps by editing the workflow.

How It Works

  1. The GitHub Actions workflow is configured to run daily using a cron job (cron: '0 9 * * *') or can be manually triggered from the "Actions" tab in the repository.
  2. For each app URL defined in the workflow, the script will scrape the app's ranking, star rating, and the total number of reviews.
  3. The scraped data is then appended to the apps_ranking.csv file in the data-branch.
  4. Each job runs in parallel for faster scraping of multiple apps.

Running the Scraper Locally

If you want to test the scraper locally before running it on GitHub Actions, you can follow these steps:

Prerequisites

  • Python 3.x installed on your local machine.

Steps

  1. Clone the repository:
    git clone https://github.com/your-username/your-repo.git
    cd your-repo
    
  2. Install the required dependencies: The dependencies are listed in the requirements.txt file. To install them, run:
pip install -r requirements.txt
  1. Run the scraper: You can run the scraper for a specific app URL by executing the run_scraper.py script and passing the app URL as a parameter:
python run_scraper.py --app_url "https://apps.apple.com/us/app/google/id284815942"
  1. Check the output: The scraped data will be appended to apps_ranking.csv, which will be created in the local directory if it doesn't already exist.

Setup

Prerequisites

  • Python 3.x
  • GitHub repository

Workflow Configuration

The workflow is defined in .github/workflows/scraper.yml and will scrape data from the app URLs defined in the APP_URLS environment variable. To modify the list of apps being scraped, add or remove app URLs in the list:

APP_URLS: |
    https://apps.apple.com/us/app/google/id284815942

Running the Workflow

There are two ways to run the scraping workflow:

  1. Automatic Daily Runs: The workflow will run automatically every day at 09:00 UTC based on the cron schedule.
  2. Manual Trigger: You can manually trigger the workflow via the "Actions" tab in the GitHub repository:
    • Go to the Actions tab.
    • Select the "Daily Scraper" workflow.
    • Click on the "Run workflow" button to start the scraper immediately.

Modifying the Scraper Script

The Python script run_scraper.py is designed to take an --app_urls argument, which is passed by the GitHub Actions workflow . The script scrapes the app's ranking, rating, and total number of reviews and appends it to the CSV file.

Feel free to modify the scraping logic or add additional data points to be extracted as needed.

CSV Output

  • The scraped data is stored in a CSV file (apps_ranking.csv) in the data-branch.
  • Each row in the CSV contains the following columns:
    • Timestamp: The date and time of the scraping run.
    • App URL: The URL of the scraped app.
    • Ranking: The app's ranking in its category.
    • Star Rating: The app's star rating.
    • Total Number of Ratings: The total number of user ratings.

Data Branch

All scraped data is committed to the data-branch to keep the main branch clean. You can access the data-branch directly or fetch the CSV file from there.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages