This is a regression problem to predict california housing prices.
The dataset contains 20640 entries and 10 variables.
- Longitude
- Latitude
- Housing Median Age
- Total Rooms
- Total Bedrooms
- Population
- Households
- Median Income
- Median House Value
- Ocean Proximity
Median House Value is to be predicted in this problem.
The first question to ask is what exactly is the business objective; building a model is probably not the end goal. How does the company expect to use and benefit from this model? This is important because it will determine how you frame the problem, what algorithms you will select, what performance measure you will use to evaluate your model, and how much effort you should spend tweaking it.
Understand the requirements of the business. Acquire the dataset. Visualize the data to understand it better and develop our intuition. Pre-process the data to make it ready to feed to our ML model. Try various models and train them. Select one that we find best. Fine-tune our model by tuning hyper-parameters Present our solution to the team. Launch, monitor, and maintain our system.