Skip to content

An end-to-end project to develop a salary estimation model based on given job info

Notifications You must be signed in to change notification settings

mz-zarei/SalaryPredictionProject

Repository files navigation

Data science job salary estimation model development (end-to-end project)

Project phases:

  1. Data collection (Done)

    • Over 1000 job info are scraped from Glassdoor using selenium library
    • With each job, we got the following: Job title, Salary Estimate, Job Description, Rating, Pros from reviews, Cons from reviews, Company, Location, Company Size, Company Founded Date, Type of Ownership, Industry, Sector, Revenue
  2. Data cleaning

    • Parsing salary estimate texts, company name, location, states
    • Extracting new features: age, seniority level, job category, requirementes (AWS, Python, SQL, SAS)
  3. Exploratory Data Analysis (EDA)

    • Distributions and value counts of features investigated
    • Linear correlation between features are analysed
    • Pivot tables are developed to get insights

Salary by Sector

Salary by Position

Word cloud of job descriptions

  1. Model training
    • Categorical variable are converted to dummy variables
    • Linear relationship between features and target are analysed
    • Four models are trained, fine-tunned, and evaluated on test set
      • Random forest and XGboost model performed the best (MAE of about 8)
  2. Productionizing ML model
    • A Falsk API endpoint is hosted on a local webserver
    • The API endpoint takes in a request with a list of values from a job listing and returns an estimated salary

Resources

Code ReUse

Python Version: 3.7
Packages: pandas, numpy, sklearn, matplotlib, seaborn, selenium, flask, json, pickle
For Web Framework Requirements: pip install -r requirements.txt

About

An end-to-end project to develop a salary estimation model based on given job info

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published