Code Written in R using RStudio Notebook. Open the R Markdown file here for code and commentary.
Our goal is to build a model that can predict mpg
. We want to be able to predict the mileage of a vehicle from other attributes.
The Auto
dataset is available in the ISLR
package. The dataset contains 392 observations with 9 attributes for each observation. The attributes are briefly described below:
- mpg - miles per gallon
- cylinders - Number of cylinders between 4 and 8
- displacement - Engine displacement (cu. inches)
- horsepower - Engine horsepower
- weight - Vehicle weight (lbs.)
- acceleration - Time to accelerate from 0 to 60 mph (sec.)
- year - Model year (modulo 100)
- origin - Origin of car (1. American, 2. European, 3. Japanese)
- name - Vehicle name
We ignore the name attribute as it is too varied to include in the model. We use all the data to train the model, and compute test error through cross validation.
See what variables are useful in predicting the outcome. Perform transformations as required.
Fit the model using:
- Standard Least Squares
- Best-subset selection
- Ridge regression
- Lasso regularization
- Principal Component Regression (PCR)
- Partial Least Squares (PLS)
Compare coefficients, MSE, and find the best model.
The best model was achived through Partial Least Squares (PLS). It gave us the lowest MSE - 8.677.