v4.0.0 (#12)

* Timeseries notebook update (#2) * updates for gs/rs * run with the removal of errors * change to images path required for .md display, update to AutoML notebooks to remove errors * addition of feature impact/confmat for automl * updated Automl to reflect NLP addition. Fixed dockerfile * removed image directory in docker * new clustering updates * hc fixes * ap fixes * added time series notebooks * updated docker to use pip to install ml requirements * added result show for ap * rename notebook * updated README * updated README * Delete 13 Time Series Forecasting.ipynb * time series review * added extra notes for TS notebook * update to time series notebook, change to utilities to use util namespace * Review of time series notebook and utils update (#3) * clustering updates * nlp updates * clustering and automl review * updated graphics * pulled updated version * general plotting functions * review of time series and utils update * cluster update Co-authored-by: Deanna Morgan <dmorgan1@kx.com> Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com> Co-authored-by: Deanna Morgan <dmorgan1@kx.com> Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com> Co-authored-by: Dianeod <dodonoghue@kx.com> Co-authored-by: Dianeod <40861871+Dianeod@users.noreply.github.com> * Addition of time series notebooks. Updated docker pip installs (#1) * updates for gs/rs * run with the removal of errors * change to images path required for .md display, update to AutoML notebooks to remove errors * addition of feature impact/confmat for automl * updated Automl to reflect NLP addition. Fixed dockerfile * removed image directory in docker * new clustering updates * hc fixes * ap fixes * added time series notebooks * updated docker to use pip to install ml requirements * added result show for ap * rename notebook * updated README * updated README * Delete 13 Time Series Forecasting.ipynb * time series review * added extra notes for TS notebook * clustering updates * nlp updates * update to time series notebook, change to utilities to use util namespace * clustering and automl review * updated graphics * pulled updated version * general plotting functions * review of time series and utils update * cluster update Co-authored-by: Deanna Morgan <dmorgan1@kx.com> Co-authored-by: Conor McCarthy <cmccarthy1@kx.com> Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com> * Update to clustering notebook to use kmeans dictionary inputs Co-authored-by: Deanna Morgan <dmorgan1@kx.com> Co-authored-by: dmorgankx <44678213+dmorgankx@users.noreply.github.com> Co-authored-by: Dianeod <dodonoghue@kx.com> Co-authored-by: Dianeod <40861871+Dianeod@users.noreply.github.com> Co-authored-by: Conor McCarthy <conormccarthy@brainpool1.mynet>
KxSystems · Oct 6, 2020 · f0152bc · f0152bc
1 parent ae088d1
commit f0152bc
Show file tree

Hide file tree

Showing 28 changed files with 46,313 additions and 1,343 deletions.
diff --git a/README.md b/README.md
@@ -16,7 +16,7 @@ The Kx NLP library can be used to answer a variety of questions about unstructur
 
 ## ML-Toolkit
 
-The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, and clustering algorithms.
+The toolkit contains libraries and scripts that provide kdb+/q users with general-use functions and procedures to perform machine-learning tasks on a wide variety of datasets. This includes utility functions, the FRESH (FeatuRe Extraction and Scalable Hypothesis testing) algorithm, cross validation and grid search procedures, clustering algorithms, time series forecasting models and feature engineering functions.
 
 ## AutoML
 
@@ -47,6 +47,8 @@ The contents of the notebooks are as follows:
 
 11. **Clustering**: Examples of how to use the k-means, DBSCAN, affinity propagation, hierarchical and CURE algorithms available within the ML-Toolkit are provided. The notebook demonstrates how to effectively visualize results produced and make use of scoring functions contained within the toolkit. A real-world application is also included.
 
+12. **Time Series Forecasting**: The notebook looks at a variety of time series forecasting models contained within the ML-Toolkit such as AR, ARIMA and SARIMA models along with time series specific feature engineering tools for passing time series data to supervised machine learning models.
+
 ## Requirements 
 
 - kdb+>=? v3.5 64-bit
@@ -88,4 +90,4 @@ For subsequent runs, you will not be prompted to redo the license setup when cal
 	docker start -ai mymlnotebooks
 
 
-**N.B.** [build instructions for the image are available](docker/README.md)
+**N.B.** [build instructions for the image are available](docker/README.md)
diff --git a/data/IMBD.csv b/data/IMBD.csv
diff --git a/data/london_merged.csv b/data/london_merged.csv
diff --git a/docker/Dockerfile b/docker/Dockerfile
@@ -15,7 +15,6 @@ FROM jupyterq AS mlnotebooks
 
 COPY requirements.txt README.md /opt/kx/mlnotebooks/
 COPY data/ /opt/kx/mlnotebooks/data/
-COPY images/ /opt/kx/mlnotebooks/images/
 COPY notebooks/ /opt/kx/mlnotebooks/notebooks/
 COPY utils/ /opt/kx/mlnotebooks/utils/
 #hack, better way, tensorflow-gpu should be used if possible
@@ -65,10 +64,10 @@ USER kx
 RUN . /opt/conda/etc/profile.d/conda.sh \
 	&& conda activate kx \ 
 	&& conda install --file /opt/kx/nlp/requirements.txt \ 
-	&& conda update wrapt \
 	&& pip install -r /opt/kx/mlnotebooks/requirements.txt \
 	&& conda install -c anaconda graphviz \
-	&& conda install -c conda-forge --file /opt/kx/ml/requirements.txt \ 
+        && pip install pip==9.0.1 \
+        && pip install -r /opt/kx/ml/requirements.txt \
 	&& conda install -c conda-forge --file /opt/kx/automl/requirements.txt \
 	&& conda clean -y --all \
 	&& python -m spacy download en \

diff --git a/notebooks/01 Decision Trees.ipynb b/notebooks/01 Decision Trees.ipynb
diff --git a/notebooks/02 Random Forests.ipynb b/notebooks/02 Random Forests.ipynb
diff --git a/notebooks/03 Neural Networks.ipynb b/notebooks/03 Neural Networks.ipynb
diff --git a/notebooks/04 Dimensionality Reduction.ipynb b/notebooks/04 Dimensionality Reduction.ipynb
diff --git a/notebooks/05 Feature Engineering.ipynb b/notebooks/05 Feature Engineering.ipynb
diff --git a/notebooks/06 Feature Extraction and Selection.ipynb b/notebooks/06 Feature Extraction and Selection.ipynb
diff --git a/notebooks/07 Cross Validation.ipynb b/notebooks/07 Cross Validation.ipynb
diff --git a/notebooks/08 Natural Language Processing.ipynb b/notebooks/08 Natural Language Processing.ipynb
@@ -361,17 +361,17 @@
    ],
    "source": [
     "/ plot occurence of top terms per chapter\n",
-    "plt[`:figure][`figsize pykw 20 10];\n",
+    ".util.plt[`:figure][`figsize pykw 20 10];\n",
     "{a:exec chapter from tab where term=x;\n",
     " b:exec occurences from tab where term=x;\n",
-    " plt[`:plot][a;b];\n",
+    " .util.plt[`:plot][a;b];\n",
     " }each key 10#keywords; \n",
     "\n",
-    "plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
-    "plt[`:ylabel]\"Occurences\";\n",
-    "plt[`:xlabel]\"Chapter\";\n",
-    "plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
-    "plt[`:show][];"
+    ".util.plt[`:title]\"The occurences per chapter of the top 10 keywords\";\n",
+    ".util.plt[`:ylabel]\"Occurences\";\n",
+    ".util.plt[`:xlabel]\"Chapter\";\n",
+    ".util.plt[`:legend][key 10#keywords;`loc pykw\"upper left\"];\n",
+    ".util.plt[`:show][];"
    ]
   },
   {
@@ -1146,7 +1146,7 @@
    "source": [
     "#This table can then be used to plot a graph. The below example was rendered in Analyst for Kx, where node size represents email volume.\n",
     "\n",
-    "<img src=\"../images/network.png\" />"
+    "<img src=\"images/network.png\" />"
    ]
   },
   {
@@ -1466,7 +1466,7 @@
    "file_extension": ".q",
    "mimetype": "text/x-q",
    "name": "q",
-   "version": "3.6.0"
+   "version": "4.0"
   }
  },
  "nbformat": 4,

diff --git a/notebooks/09 K Nearest Neighbours.ipynb b/notebooks/09 K Nearest Neighbours.ipynb