This repository originally contained a number of computational models which can be used for data science.
In addition, there are some utilities.
In src
:
- An object parser which converts JSON data to Python classes:
object_parser.py
. - An OAS-generator for Python classes:
oas.py
. - A parallelization framework for load testing:
parallel.py
.
./main <TAB>
# this will list the support modules
echo fft linear_fit random_walk semilinear_fit
Using main
./main random_walk
./main linear_fit
./main semilinear_fit
Or, as Python modules
python3 src/data_science/random_walk.py
Below are examples of various models, ranging from simple linear models with analytical solutions to more complex models with numerical solutions.
src/random_walk.py generates datasets that behave like random walks.
src/linear_fit.py fits linear models. The simplicity of the models reduces overfitting, but this is not explicitly tested.
- A linear regression model using normalized input data, while assuming a specific function (e.g. quadratic or exponential).
- Polynomial regression. A linear model (w.r.t. the parameters) that uses non-linear basis functions. Note that the fit for the exponential signal on the right-most plot is poor.
src/semilinear_fit.py fits various non-linear models.
- Bayesian ridge regression, with polynomial and sinoid basis functions.
- A Gaussian Process.
Note that these models estimate both a mean and a standard deviation, which can be used to define a confidence interval (C.I.).
The accuracy is derived using relative mean absolute error. It is an overestimation because the test-data overlaps with the training-data.
Sampling from the Gaussian Process produces a collection of possible futures.
Using a Makefile
for convenience.
make install
make test
Setup completions
source setup/setup.sh