Skip to content

Latest commit

 

History

History
56 lines (36 loc) · 3.42 KB

README.md

File metadata and controls

56 lines (36 loc) · 3.42 KB

Machine-Learning-Notebooks

1. Exponential Regression

Modelling bacteria using exponential regression. This is done by transforming the exponential data into linear data using $\ln(\cdot)$, after which regular linear regression techniques can be used. Regressing based on temperature and humidity a $R^2 = 78 %$ is obtained. An important task of regression is to check the validity of the linear regression assumptions pertaining to the residuals.

  • Residulas have constant variance
  • Residulas are independent
  • Resiudals are normally distributed

The residual analysis was performed with graphical methods, although, more sophisticated statistical tests for normality and correlation could be useful.

image

Next, confidence and prediction bands were calculated for the regression model as shown below. The confidence interval gives a range of possible models that could be fit depending on the sample of the population. The prediction interval gives a range of possible values for a new observation. Note the difference between the two.

image

2. Polynomial Regression

Modelling the welding strenght based on the current being used for the welding process can be done with polynomial regression. In this notebook, it is done using varying degrees of polynomials. Using higher degree polynomials leads to less bias but more variance. To counteract this Tikhonov regularization is made, which ameliorates the overfitting of the polynomial to the data.

image image

Also, confidence and perdiction bands were found as shown below.

image

3. Logistic Regression

Classification of surviving the Titanic is made with a self-impelemnted logistic regression model. Based on features of sex, class and age, the model has a ~ 75 % accuracy. A comparision with sklearn's logistic regression and kNN classifier is made. The logistic regression results are nearly identical (sklearn regularizeses too), and the results are better than kNN.

4. Softmax Regression

Classification of handwritten digits (MNIST dataset). Uses full gradient decent without batching the model yields an accuracy of ~ 90 %. The confusion matrix is displayed below.

image

5. Time Series Analysis

Analysing sea-level dataset from NASA. AR, MA and ARMA models are used, in addition to a nonlinear AR model using feed forward neural networks. The results of the network approach are, however, quite disappointing.

image