This big data project focused on forecasting assessed property values as an indicator of home sale prices in Calgary by training a machine learning model using PySpark.
The following factors were analyzed to predict the housing prices:
- macro-economic conditions such as unemployment rates, average house prices, and supply-demand dynamics
- community-specific elements like local crime rates, location, and community density
- individual housing features such as land size and use designation, our goal is to anticipate housing prices
The aim of the project was to provide residents in Calgary with meaningful insights, enabling them to make informed decisions regarding their home purchases.
The following steps were completed to create a regression model to solve this problem which is thoroughly detailed in the Final Report.
- Data Collection
- Data Inspection and Validation
- Data Filtering
- Data Transformations
- Exploratory Data Analysis
- Model Building and Results