Skip to content

Commit

Permalink
v4.0.0 (#12)
Browse files Browse the repository at this point in the history
* Timeseries notebook update (#2)

* updates for gs/rs

* run with the removal of errors

* change to images path required for .md display, update to AutoML notebooks to remove errors

* addition of feature impact/confmat for automl

* updated Automl to reflect NLP addition. Fixed dockerfile

* removed image directory in docker

* new clustering updates

* hc fixes

* ap fixes

* added time series notebooks

* updated docker to use pip to install ml requirements

* added result show for ap

* rename notebook

* updated README

* updated README

* Delete 13 Time Series Forecasting.ipynb

* time series review

* added extra notes for TS notebook

* update to time series notebook, change to utilities to use util namespace

* Review of time series notebook and utils update (#3)

* clustering updates

* nlp updates

* clustering and automl review

* updated graphics

* pulled updated version

* general plotting functions

* review of time series and utils update

* cluster update

Co-authored-by: Deanna Morgan <dmorgan1@kx.com>
Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com>

Co-authored-by: Deanna Morgan <dmorgan1@kx.com>
Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com>
Co-authored-by: Dianeod <dodonoghue@kx.com>
Co-authored-by: Dianeod <40861871+Dianeod@users.noreply.github.com>

* Addition of time series notebooks. Updated docker pip installs (#1)

* updates for gs/rs

* run with the removal of errors

* change to images path required for .md display, update to AutoML notebooks to remove errors

* addition of feature impact/confmat for automl

* updated Automl to reflect NLP addition. Fixed dockerfile

* removed image directory in docker

* new clustering updates

* hc fixes

* ap fixes

* added time series notebooks

* updated docker to use pip to install ml requirements

* added result show for ap

* rename notebook

* updated README

* updated README

* Delete 13 Time Series Forecasting.ipynb

* time series review

* added extra notes for TS notebook

* clustering updates

* nlp updates

* update to time series notebook, change to utilities to use util namespace

* clustering and automl review

* updated graphics

* pulled updated version

* general plotting functions

* review of time series and utils update

* cluster update

Co-authored-by: Deanna Morgan <dmorgan1@kx.com>
Co-authored-by: Conor McCarthy <cmccarthy1@kx.com>
Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com>

* Update to clustering notebook to use kmeans dictionary inputs

Co-authored-by: Deanna Morgan <dmorgan1@kx.com>
Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com>
Co-authored-by: Dianeod <dodonoghue@kx.com>
Co-authored-by: Dianeod <40861871+Dianeod@users.noreply.github.com>
Co-authored-by: Conor McCarthy <conormccarthy@brainpool1.mynet>
  • Loading branch information
6 people authored Oct 6, 2020
1 parent ae088d1 commit f0152bc
Show file tree
Hide file tree
Showing 28 changed files with 46,313 additions and 1,343 deletions.
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The Kx NLP library can be used to answer a variety of questions about unstructur

## ML-Toolkit

The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, and clustering algorithms.
The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, clustering algorithms, time series forecasting models and feature engineering functions.

## AutoML

Expand Down Expand Up @@ -47,6 +47,8 @@ The contents of the notebooks are as follows:

11. **Clustering**: Examples of how to use the k-means, DBSCAN, affinity propagation, hierarchical and CURE algorithms available within the ML-Toolkit are provided. The notebook demonstrates how to effectively visualize results produced and make use of scoring functions contained within the toolkit. A real-world application is also included.

12. **Time Series Forecasting**: The notebook looks at a variety of time series forecasting models contained within the ML-Toolkit such as AR, ARIMA and SARIMA models along with time series specific feature engineering tools for passing time series data to supervised machine learning models.

## Requirements

- kdb+>=? v3.5 64-bit
Expand Down Expand Up @@ -88,4 +90,4 @@ For subsequent runs, you will not be prompted to redo the license setup when cal
docker start -ai mymlnotebooks


**N.B.** [build instructions for the image are available](docker/README.md)
**N.B.** [build instructions for the image are available](docker/README.md)
25,001 changes: 25,001 additions & 0 deletions data/IMBD.csv

Large diffs are not rendered by default.

17,415 changes: 17,415 additions & 0 deletions data/london_merged.csv

Large diffs are not rendered by default.

5 changes: 2 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ FROM jupyterq AS mlnotebooks

COPY requirements.txt README.md /opt/kx/mlnotebooks/
COPY data/ /opt/kx/mlnotebooks/data/
COPY images/ /opt/kx/mlnotebooks/images/
COPY notebooks/ /opt/kx/mlnotebooks/notebooks/
COPY utils/ /opt/kx/mlnotebooks/utils/
#hack, better way, tensorflow-gpu should be used if possible
Expand Down Expand Up @@ -65,10 +64,10 @@ USER kx
RUN . /opt/conda/etc/profile.d/conda.sh \
&& conda activate kx \
&& conda install --file /opt/kx/nlp/requirements.txt \
&& conda update wrapt \
&& pip install -r /opt/kx/mlnotebooks/requirements.txt \
&& conda install -c anaconda graphviz \
&& conda install -c conda-forge --file /opt/kx/ml/requirements.txt \
&& pip install pip==9.0.1 \
&& pip install -r /opt/kx/ml/requirements.txt \
&& conda install -c conda-forge --file /opt/kx/automl/requirements.txt \
&& conda clean -y --all \
&& python -m spacy download en \
Expand Down
227 changes: 114 additions & 113 deletions notebooks/01 Decision Trees.ipynb

Large diffs are not rendered by default.

87 changes: 44 additions & 43 deletions notebooks/02 Random Forests.ipynb

Large diffs are not rendered by default.

103 changes: 52 additions & 51 deletions notebooks/03 Neural Networks.ipynb

Large diffs are not rendered by default.

183 changes: 92 additions & 91 deletions notebooks/04 Dimensionality Reduction.ipynb

Large diffs are not rendered by default.

53 changes: 27 additions & 26 deletions notebooks/05 Feature Engineering.ipynb

Large diffs are not rendered by default.

403 changes: 164 additions & 239 deletions notebooks/06 Feature Extraction and Selection.ipynb

Large diffs are not rendered by default.

319 changes: 221 additions & 98 deletions notebooks/07 Cross Validation.ipynb

Large diffs are not rendered by default.

18 changes: 9 additions & 9 deletions notebooks/08 Natural Language Processing.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -361,17 +361,17 @@
],
"source": [
"/ plot occurence of top terms per chapter\n",
"plt[`:figure][`figsize pykw 20 10];\n",
".util.plt[`:figure][`figsize pykw 20 10];\n",
"{a:exec chapter from tab where term=x;\n",
" b:exec occurences from tab where term=x;\n",
" plt[`:plot][a;b];\n",
" .util.plt[`:plot][a;b];\n",
" }each key 10#keywords; \n",
"\n",
"plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
"plt[`:ylabel]\"Occurences\";\n",
"plt[`:xlabel]\"Chapter\";\n",
"plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
"plt[`:show][];"
".util.plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
".util.plt[`:ylabel]\"Occurences\";\n",
".util.plt[`:xlabel]\"Chapter\";\n",
".util.plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
".util.plt[`:show][];"
]
},
{
Expand Down Expand Up @@ -1146,7 +1146,7 @@
"source": [
"#This table can then be used to plot a graph. The below example was rendered in Analyst for Kx, where node size represents email volume.\n",
"\n",
"<img src=\"../images/network.png\" />"
"<img src=\"images/network.png\" />"
]
},
{
Expand Down Expand Up @@ -1466,7 +1466,7 @@
"file_extension": ".q",
"mimetype": "text/x-q",
"name": "q",
"version": "3.6.0"
"version": "4.0"
}
},
"nbformat": 4,
Expand Down
41 changes: 21 additions & 20 deletions notebooks/09 K Nearest Neighbours.ipynb

Large diffs are not rendered by default.

Loading

0 comments on commit f0152bc

Please sign in to comment.