Data Science, Machine Learning, and Artificial Intelligence Resources

Welcome!

Here is a non-exaustive, work in progress set of resources for data science, machine learning, artificial intelligence, data and text analytics, and data visualization.

I've also included links for web and API development, programming languages, DevOps tools, cloud computing, and more.

Note that resources are listed in no particular order of preference or relevance. Well... maybe except for my blog :)

Blogs

- [InnoArchiTech](http://www.innoarchitech.com/) - [Flowing Data](http://flowingdata.com/) - [KDnuggets](http://www.kdnuggets.com/) - [R-bloggers](https://www.r-bloggers.com/) - [Analytics Vidhya](https://www.analyticsvidhya.com/blog/) - [Statistical Modeling, Causal Inference, and Social Science](http://andrewgelman.com/) - [Simply Statistics](http://simplystatistics.org/) - [Walking Randomly](http://www.walkingrandomly.com/) - [FastML](http://fastml.com/) - [No Free Hunch](http://blog.kaggle.com/) - [Machine Learning Mastery](http://machinelearningmastery.com/) - [Data Science Weekly](https://www.datascienceweekly.org/) - [Edwin Chen](http://blog.echen.me/) - [Harvard Data Science](http://harvarddatascience.com/)

GitHub Repos

Notebooks

- [Data science IPython notebooks](https://github.com/donnemartin/data-science-ipython-notebooks) - [Data-Analysis-and-Machine-Learning-Projects](https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects/blob/master/example-data-science-notebook/Example%20Machine%20Learning%20Notebook.ipynb) - [machine_learning](https://github.com/masinoa/machine_learning) - [ipython-notebooks](https://github.com/jdwittenauer/ipython-notebooks) - [Spark Notebook](https://github.com/andypetrella/spark-notebook)

Book Resources

- [Python Machine Learning book resources](https://github.com/rasbt/python-machine-learning-book) - [Python Machine Learning book FAQ](https://github.com/rasbt/python-machine-learning-book/tree/master/faq) - [Learning-Predictive-Analytics-with-R](https://github.com/PacktPublishing/Learning-Predictive-Analytics-with-R) - [Data Science from Scratch book resources](https://github.com/joelgrus/data-science-from-scratch) - [IPython Cookbook materials](https://github.com/ipython-books/cookbook-code) - [Python Data Science Handbook Supplemental Materials](https://github.com/jakevdp/PythonDataScienceHandbook)

Cheats

- [GitHub markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) - [GitHub markdown guide](https://guides.github.com/features/mastering-markdown/) - [Machine learning algorithm cheat sheet](https://azure.microsoft.com/en-us/documentation/articles/machine-learning-algorithm-cheat-sheet/) - [11 Steps for Data Exploration in R](https://www.analyticsvidhya.com/blog/2015/10/cheatsheet-11-steps-data-exploration-with-codes/) - [AI Cheat Sheet](http://alexoner.github.io/AI-cheat-sheet/) - [Data Science Cheat Sheet](http://www.datasciencecentral.com/profiles/blogs/data-science-cheat-sheet)

Web Resources

- [Data Science Weekly resources](https://www.datascienceweekly.org/data-science-resources) - [Data School resources](http://www.dataschool.io/resources/) - [Open Source Data Science Masters](http://datasciencemasters.org/) - [Open Source Data Science Masters - GitHub](https://github.com/datasciencemasters) - [Choosing the right estimator](http://scikit-learn.org/stable/tutorial/machine_learning_map/) - [Machine Intelligence 3.0](https://format-com-cld-res.cloudinary.com/image/private/s--RCb7PzQR--/c_crop,h_1500,w_2000,x_0,y_0/c_fill,g_center,h_855,w_1140/a_auto,dpr_2,fl_keep_iptc.progressive,q_95/v1/19575bcc040a6dcff3097618ec9c585e/MI-Landscape-3_7.png)

Datasets

- [Awesome Public Datasets](https://github.com/caesar0301/awesome-public-datasets) - [AWS Public Datasets](https://aws.amazon.com/datasets/) - [100+ Interesting Data Sets for Statistics](http://rs.io/100-interesting-data-sets-for-statistics/) - [Kaggle Datasets](https://www.kaggle.com/datasets) - [FiveThirtyEight data](https://github.com/fivethirtyeight/data) - [Google BigQuery Public Datasets](https://cloud.google.com/bigquery/public-data/) - [UCI Machine Learning Repository](http://archive.ics.uci.edu/ml/datasets.html) - [Stanford Large Network Dataset Collection](http://snap.stanford.edu/data/#!) - [THE MNIST DATABASE of handwritten digits](http://yann.lecun.com/exdb/mnist/) - [THE Wikipedia Corpus](http://corpus.byu.edu/wiki/)

IDEs

- [Sublime Text](https://www.sublimetext.com/) - [R Studio](https://support.rstudio.com/hc/en-us/categories/200035113-Documentation)

Programming Languages and OS

- [Python](https://docs.python.org/3/) - [R](https://cran.r-project.org/manuals.html) - [JavaScript](https://developer.mozilla.org/en-US/docs/Web/JavaScript) - [SQL](https://en.wikipedia.org/wiki/SQL) - [Julia](http://docs.julialang.org/en/release-0.5/) - [Scala](http://docs.scala-lang.org/) - [Java](https://docs.oracle.com/javase) - [C++](http://devdocs.io/cpp/) - [HTML](https://developer.mozilla.org/en-US/docs/Web/HTML) - [CSS](https://developer.mozilla.org/en-US/docs/Web/CSS) - [Bash](http://ss64.com/bash/) - [Ubuntu](https://help.ubuntu.com/) - [JSON](http://www.json.org/) - [JSON-RPC](http://json-rpc.org/) - [YAML](http://yaml.org/spec/1.2/spec.html) - [Git](https://git-scm.com/documentation)

Database and Big Data

- [AWS](https://aws.amazon.com/documentation/) - [Redshift](https://aws.amazon.com/documentation/redshift/) - Fast, simple, cost-effective data warehousing - [DynamoDB](https://aws.amazon.com/documentation/dynamodb/) - Fast and flexible NoSQL database service for any scale - [RDS](https://aws.amazon.com/documentation/rds/) - Amazon Relational Database Service + [Amazon Aurora](https://aws.amazon.com/rds/aurora/getting-started/) - MySQL-compatible relational database with 5X performance + [Oracle](https://docs.oracle.com/en/database/) + [Microsoft SQL Server](https://msdn.microsoft.com/en-us/library/mt590198(v=sql.1).aspx) + [PostgreSQL](https://www.postgresql.org/docs/) + [MySQL](https://dev.mysql.com/doc/) + [MariaDB](https://mariadb.org/learn/) + [Kinesis](https://aws.amazon.com/documentation/kinesis/) - Real-time streaming data in the AWS cloud * Firehouse - Easily load real-time streaming data into AWS * Analytics - Get actionable insights from streaming data in real-time * Streams - Build custom applications that process or analyze streaming data for specialized needs + [Amazon EMR](https://aws.amazon.com/documentation/elastic-mapreduce/) - Easily Run and Scale Apache Hadoop, Spark, HBase, Presto, Hive, and other Big Data Frameworks + [QuickSight](https://aws.amazon.com/documentation/quicksight/) - Fast, easy to use business analytics + [Machine Learning](https://aws.amazon.com/documentation/machine-learning/) + [IoT](https://aws.amazon.com/documentation/iot/) - Easily and securely connect devices to the cloud + [AWS Data Pipeline](https://aws.amazon.com/documentation/data-pipeline/) - Easily automate the movement and transformation of data - [Google Cloud Platform](https://cloud.google.com/docs/) + [BigQuery](https://cloud.google.com/bigquery/docs/) - Fully managed, petabyte scale, low cost analytics data warehouse + [Dataflow](https://cloud.google.com/dataflow/docs/) - A fully-managed cloud service and programming model for batch and streaming big data processing + [Dataproc](https://cloud.google.com/dataproc/docs/) - A managed Apache Spark and Apache Hadoop service that lets you take advantage of open source data tools for batch processing, querying, streaming, and machine learning + [Datalab](https://cloud.google.com/datalab/docs/) - An easy to use interactive tool for large-scale data exploration, analysis, and visualization + [Machine Learning](https://cloud.google.com/ml/docs/) - Machine Learning on any data, of any size + [Prediction API](https://cloud.google.com/prediction/docs/) - A RESTful API to build Machine Learning models + [Jobs API](https://cloud.google.com/jobs-api/) - Job search and discovery powered by machine learning + [Natural Language API](https://cloud.google.com/natural-language/docs/) - Provides natural language understanding technologies to developers, including sentiment analysis, entity recognition, and syntax analysis + [Speech API](https://cloud.google.com/speech/docs/) - Easy integration of Google speech recognition technologies into developer applications + [Translate API](https://cloud.google.com/translate/docs/) - Dynamically translate text between thousands of language pairs + [Vision API](https://cloud.google.com/vision/docs/) - Easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content + [Pub/Sub](https://cloud.google.com/pubsub/docs/) - A fully-managed real-time messaging service that allows you to send and receive messages between independent applications - [Apache Foundation](https://www.apache.org/) + [HBase](https://hbase.apache.org/book.html) - Apache HBase is the Hadoop database, a distributed, scalable, big data store + [Hadoop](http://hadoop.apache.org/docs/current/) - Open-source software for reliable, scalable, distributed computing + [Spark](http://spark.apache.org/docs/latest/) - A fast and general engine for large-scale data processing + [Hive](https://cwiki.apache.org/confluence/display/Hive/LanguageManual) - Data warehouse software that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL + [Pig](http://pig.apache.org/docs/r0.16.0/) - A platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs + [Kylin](http://kylin.apache.org/docs15/) - An open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop supporting extremely large datasets, original contributed from eBay Inc + [Lens](http://lens.apache.org/user/index.html) - A unified analytics interface + [Ignite](https://apacheignite.readme.io/docs) - A high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk-based or flash technologies + [Brooklyn](https://brooklyn.apache.org/documentation/index.html) - A framework for modeling, monitoring, and managing applications through autonomic blueprints + [Apex](https://apex.apache.org/docs.html) - Enterprise-grade unified stream and batch processing engine + [Tajo](http://tajo.apache.org/docs/current/index.html) - A robust big data relational and distributed data warehouse system for Apache Hadoop + [Tez](https://tez.apache.org/user_guides.html) - An application framework which allows for a complex directed-acyclic-graph of tasks for processing data + [Bigtop](http://bigtop.apache.org/) - Project for Infrastructure Engineers and Data Scientists looking for comprehensive packaging, testing, and configuration of the leading open source big data components + [REEF](http://reef.apache.org/introduction.html) - Apache REEF (Retainable Evaluator Execution Framework) is a library for developing portable applications for cluster resource managers such as Apache Hadoop YARN or Apache Mesos + [Storm](http://storm.apache.org/index.html) - A free and open source distributed realtime computation system + [Kafka](https://kafka.apache.org/) - A distributed streaming platform - NoSQL - [MongoDB](https://docs.mongodb.com/) - NoSQL document store - [Redis](http://redis.io/documentation) - An open source (BSD licensed), in-memory data structure store, used as database, cache and message broker - [BigTable](https://cloud.google.com/bigtable/docs/) - Fast, fully managed, massively scalable NoSQL database service - [Neo4j](https://neo4j.com/docs/) - World's fastest and most scalable graph database - [CouchBase](http://developer.couchbase.com/documentation-archive) - A document database with a SQL-based query language that is engineered to deliver performance at scale - [Cassandra](http://cassandra.apache.org/doc/latest/) - Free and open-source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure - [Riak](https://docs.basho.com/) - Distributed NoSQL Database - [CouchDB](http://docs.couchdb.org/en/2.0.0/) - NoSQL document store - RDBMS + [MySQL](https://dev.mysql.com/doc/) - Open source RDBMS + [PostgreSQL](https://www.postgresql.org/docs/) - Open-source Object-Relational DBMS supporting almost all SQL constructs - Static storage - [S3](https://aws.amazon.com/documentation/s3/) - Simple, durable, massively scalable object storage - Search and full-text - [ElasticSearch](https://www.elastic.co/guide/index.html) - Service that makes it easy to deploy, operate, and scale Elasticsearch in the AWS Cloud - Cache - [Memcache](https://memcached.org/) - High-performance, distributed memory object caching system

Platforms, Libraries, and Packages

- [Keras: Deep Learning library for Theano and TensorFlow](https://keras.io/) - A high-level neural networks library, written in Python and capable of running on top of either TensorFlow or Theano - [Weka](http://www.cs.waikato.ac.nz/ml/weka/documentation.html) - A collection of machine learning algorithms for data mining tasks - [Theano](http://deeplearning.net/software/theano/) - Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently - [TensorFlow](https://www.tensorflow.org/versions/r0.11/api_docs/index.html) - Open source software library for numerical computation using data flow graphs - [Anaconda](https://docs.continuum.io/) - Open data science platform powered by Python - [Amazon Deep Scalable Sparse Tensor Network Engine (DSSTNE)](https://github.com/amznlabs/amazon-dsstne) - An Amazon developed library for building Deep Learning (DL) machine learning (ML) models - [Torch](http://torch.ch/docs/getting-started.html#_) - A scientific computing framework with wide support for machine learning algorithms that puts GPUs first - [Caffe](http://caffe.berkeleyvision.org/) - A deep learning framework made with expression, speed, and modularity in mind - [DL4J](https://deeplearning4j.org/) - Open-Source, Distributed, Deep Learning Library for the JVM - [DataRobot](https://www.datarobot.com/) - Automated Machine Learning - [IBM Watson](http://www.ibm.com/watson/developercloud/doc/getting_started/) - Cognitive computing features in your app using IBM Watson's Language, Vision, Speech and Data APIs - [Microsoft Machine Learning](Machine Learning) - Powerful cloud based analytics - Python + [IPython Documentation](http://ipython.readthedocs.io/en/stable/) - Comprehensive environment for interactive and exploratory computing + [Jupyter notebook](http://jupyter-notebook.readthedocs.io/en/latest/) - A web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text + [Matplotlib](http://matplotlib.org/) - A python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms + [Natural Language Toolkit](http://www.nltk.org/) - A leading platform for building Python programs to work with human language data + [Numpy](https://docs.scipy.org/doc/) - The fundamental package for scientific computing with Python + [Scipy](https://docs.scipy.org/doc/) - A Python-based ecosystem of open-source software for mathematics, science, and engineering + [Pandas](http://pandas.pydata.org/pandas-docs/stable/) - An open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language + [PyBrain](http://pybrain.org/) - Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library + [Scikit-image](http://scikit-image.org/) - A collection of algorithms for image processing + [Scikit-learn](http://scikit-learn.org/stable/documentation.html) - A Python module for machine learning + [Seaborn](http://seaborn.pydata.org/api.html) - A Python visualization library based on matplotlib + [StatsModels](http://statsmodels.sourceforge.net/documentation.html) - A Python module that allows users to explore data, estimate statistical models, and perform statistical tests + [Pattern](https://github.com/clips/pattern) - Web mining module for Python, with tools for scraping, natural language processing, machine learning, network analysis and visualization + [Scrapy](https://scrapy.org/doc/) - An open source and collaborative framework for extracting the data you need from websites + [Bokeh](http://bokeh.pydata.org/en/latest/docs/user_guide.html) - A Python interactive visualization library that targets modern web browsers for presentation + [Basemap](http://matplotlib.org/basemap/users/index.html) - A library for plotting 2D data on maps in Python + [NetworkX](http://networkx.github.io/documentation.html) - A Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks + [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) - A Python library for pulling data out of HTML and XML files - R + [General CRAN List - By task](https://cran.r-project.org/web/views/) + [General CRAN List - NLP/Text analytics](https://cran.r-project.org/web/views/NaturalLanguageProcessing.html) + [General CRAN List](https://cran.r-project.org/web/views/MachineLearning.html) + [ggplot2](http://docs.ggplot2.org/current/) - A plotting system for R + [ISLR](https://cran.r-project.org/web/packages/ISLR/index.html) - The collection of datasets used in the book "An Introduction to Statistical Learning with Applications in R" + [Rcpp](https://cran.r-project.org/web/packages/Rcpp/index.html) - Provides R functions as well as C++ classes which offer a seamless integration of R and C++ + [dplyr](https://cran.r-project.org/web/packages/dplyr/index.html) - A fast, consistent tool for working with data frame like objects, both in memory and out of memory + [plyr](https://cran.r-project.org/web/packages/plyr/index.html) - A set of tools that solves a common set of problems + [stringr](https://cran.r-project.org/web/packages/stringr/index.html) - A consistent, simple and easy to use set of wrappers around the fantastic 'stringi' package + [shiny](https://cran.r-project.org/web/packages/shiny/index.html) - Easy to build interactive web applications with R + [knitr](https://cran.r-project.org/web/packages/knitr/index.html) - A general-purpose tool for dynamic report generation in R using Literate Programming techniques + [readr](https://cran.r-project.org/web/packages/readr/index.html) - Read flat/tabular text files from disk (or a connection) + [R Markdown](https://cran.r-project.org/web/packages/rmarkdown/index.html) - Convert R Markdown documents into a variety of formats + [tidyr](https://cran.r-project.org/web/packages/tidyr/index.html) - Data tidying (not general reshaping or aggregating) and works well with 'dplyr' data pipelines + [lubridate](https://cran.r-project.org/web/packages/lubridate/index.html) - Functions to work with date-times and time-spans + [lme4](https://cran.r-project.org/web/packages/lme4/index.html) - Fit linear and generalized linear mixed-effects models + [nlme](https://cran.r-project.org/web/packages/nlme/index.html) - Fit and compare Gaussian linear and nonlinear mixed-effects models + [mime](https://cran.r-project.org/web/packages/mime/index.html) - Guesses the MIME type from a filename extension using the data derived from /etc/mime.types in UNIX-type systems + [mda](https://cran.r-project.org/web/packages/mda/index.html) - Mixture and flexible discriminant analysis, multivariate adaptive regression splines (MARS), BRUTO, ... + [lasso2](https://cran.r-project.org/web/packages/lasso2/index.html) - Routines and documentation for solving regression problems while imposing an L1 constraint on the estimates + [lars](https://cran.r-project.org/web/packages/lars/index.html) - Efficient procedures for fitting an entire lasso sequence with the cost of a single least squares fit + [digest](https://cran.r-project.org/web/packages/digest/index.html) - Implementation of a function 'digest()' for the creation of hash digests of arbitrary R objects (using the 'md5', 'sha-1', 'sha-256', 'crc32', 'xxhash' and 'murmurhash' algorithms) permitting easy comparison of R language objects, as well as a function 'hmac()' to create hash-based message authentication code + [reshape2](https://cran.r-project.org/web/packages/reshape2/index.html) - Flexibly restructure and aggregate data using just two functions: melt and 'dcast' (or 'acast') + [colorspace](https://cran.r-project.org/web/packages/colorspace/index.html) - Carries out mapping between assorted color spaces including RGB, HSV, HLS, CIEXYZ, CIELUV, HCL (polar CIELUV), CIELAB and polar CIELAB + [RColorBrewer](https://cran.r-project.org/web/packages/RColorBrewer/index.html) - Provides color schemes for maps (and other graphics) + [manipulate](https://cran.r-project.org/web/packages/manipulate/index.html) - Interactive plotting functions for use within RStudio + [scales](https://cran.r-project.org/web/packages/scales/index.html) - Graphical scales map data to aesthetics, and provide methods for automatically determining breaks and labels for axes and legends + [labeling](https://cran.r-project.org/web/packages/labeling/index.html) - Provides a range of axis labeling algorithms + [proto](https://cran.r-project.org/web/packages/proto/index.html) - An object oriented system using object-based, also called prototype-based, rather than class-based object oriented ideas + [randomForest](https://cran.r-project.org/web/packages/randomForest/index.html) - Classification and regression based on a forest of trees using random inputs + [glmnet](https://cran.r-project.org/web/packages/glmnet/index.html) - Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression and the Cox model + [caret](https://cran.r-project.org/web/packages/caret/index.html) - Misc functions for training and plotting classification and regression models + [ggvis](https://cran.r-project.org/web/packages/ggvis/index.html) - An implementation of an interactive grammar of graphics, taking the best parts of 'ggplot2', combining them with the reactive framework of 'shiny' and drawing web graphics using 'vega' + [rgl](https://cran.r-project.org/web/packages/rgl/index.html) - Provides medium to high level functions for 3D interactive graphics, including functions modelled on base graphics (plot3d(), etc.) as well as functions for constructing representations of geometric objects (cube3d(), etc.) + [htmlwidgets](https://cran.r-project.org/web/packages/htmlwidgets/index.html) - A framework for creating HTML widgets that render in various contexts including the R console, 'R Markdown' documents, and 'Shiny' web applications + [leaflet](https://cran.r-project.org/web/packages/leaflet/index.html) - Create and customize interactive maps using the 'Leaflet' JavaScript library and the 'htmlwidgets' package + [dygraphs](https://cran.r-project.org/web/packages/dygraphs/index.html) - An R interface to the 'dygraphs' JavaScript charting library + [googleVis](https://cran.r-project.org/web/packages/googleVis/index.html) - R interface to Google Charts API, allowing users to create interactive charts based on data frames + [zoo](https://cran.r-project.org/web/packages/zoo/index.html) - An S3 class with methods for totally ordered indexed observations. It is particularly aimed at irregular time series of numeric vectors/matrices and factors + [RCurl](https://cran.r-project.org/web/packages/RCurl/index.html) - A wrapper for 'libcurl' Provides functions to allow one to compose general HTTP requests and provides convenient functions to fetch URIs, get & post forms, etc. and process the results returned by the Web server + [jsonlite](https://cran.r-project.org/web/packages/jsonlite/index.html) - A fast JSON parser and generator optimized for statistical data and the web + [bitops](https://cran.r-project.org/web/packages/bitops/index.html) - Functions for bitwise operations on integer vectors + [devtools](https://cran.r-project.org/web/packages/devtools/index.html) - Collection of package development tools + [magrittr](https://cran.r-project.org/web/packages/magrittr/index.html) - Provides a mechanism for chaining commands with a new forward-pipe operator, %>% + [packrat](https://cran.r-project.org/web/packages/packrat/index.html) - Manage the R packages your project depends on in an isolated, portable, and reproducible way + [Haven](https://cran.r-project.org/web/packages/haven/index.html) - Import foreign statistical formats into R via the embedded 'ReadStat' C library + [DT](https://cran.r-project.org/web/packages/DT/index.html) - Data objects in R can be rendered as HTML tables using the JavaScript library 'DataTables' (typically via R Markdown or Shiny) + [MICE](https://cran.r-project.org/web/packages/mice/index.html) - Multiple imputation using Fully Conditional Specification (FCS) implemented by the MICE algorithm + [rpart](https://cran.r-project.org/web/packages/rpart/index.html) - Recursive partitioning for classification, regression and survival trees + [party](https://cran.r-project.org/web/packages/party/index.html) - A computational toolbox for recursive partitioning + [nnet](https://cran.r-project.org/web/packages/nnet/index.html) - Software for feed-forward neural networks with a single hidden layer, and for multinomial log-linear models + [e1071](https://cran.r-project.org/web/packages/e1071/index.html) - Functions for latent class analysis, short time Fourier transform, fuzzy clustering, support vector machines, shortest path computation, bagged clustering, naive Bayes classifier, ... + [kernlab](https://cran.r-project.org/web/packages/kernlab/index.html) - Kernel-based machine learning methods for classification, regression, clustering, novelty detection, quantile regression and dimensionality reduction + [gbm](https://cran.r-project.org/web/packages/gbm/index.html) - Includes regression methods for least squares, absolute loss, t-distribution loss, quantile regression, logistic, multinomial logistic, Poisson, Cox proportional hazards partial likelihood, AdaBoost exponential loss, Huberized hinge loss, and Learning to Rank measures (LambdaMart) + [wordcloud](https://cran.r-project.org/web/packages/wordcloud/index.html) - Pretty word clouds + [c50](https://cran.r-project.org/web/packages/C50/index.html) - C5.0 decision trees and rule-based models for pattern recognition + [class](https://cran.r-project.org/web/packages/class/index.html) - Various functions for classification, including k-nearest neighbour, Learning Vector Quantization and Self-Organizing Maps + [neuralnet](https://cran.r-project.org/web/packages/neuralnet/index.html) - Training of neural networks using backpropagation, resilient backpropagation with (Riedmiller, 1994) or without weight backtracking (Riedmiller and Braun, 1993) or the modified globally convergent version by Anastasiadis et al. (2005) + [tm](https://cran.r-project.org/web/packages/tm/index.html) - A framework for text mining applications within R + [gmodels](https://cran.r-project.org/web/packages/gmodels/index.html) - Various R programming tools for model fitting + [rodbc](https://cran.r-project.org/web/packages/RODBC/index.html) - An ODBC database interface + [princurve](https://cran.r-project.org/web/packages/princurve/index.html) - Fits a principal curve to a data matrix in arbitrary dimensions

Cloud/SaaS/PaaS/IaaS

- [AWS](https://aws.amazon.com/documentation/) + [Lambda](https://aws.amazon.com/documentation/lambda/) - Serverless compute. AWS Lambda lets you run code without provisioning or managing servers + [EC2](https://aws.amazon.com/documentation/ec2/) - Web service that provides resizable compute capacity in the cloud + [Elastic Beanstalk](https://aws.amazon.com/documentation/elastic-beanstalk/) - Deploy and scale web applications and services + [ElastiCache](https://aws.amazon.com/documentation/elasticache/) - Web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud + [Amazon Simple Notification Service (SNS)](https://aws.amazon.com/documentation/sns/) - Fully managed and highly scalable push messaging + [Amazon Simple Email Service (Amazon SES)](https://aws.amazon.com/documentation/ses/) - Reliable, cost-effective email platform + [Amazon Simple Queue Service (SQS)](https://aws.amazon.com/documentation/sqs/) - A fast, reliable, scalable, fully managed message queuing service - [Apache Projects List (by category)](https://projects.apache.org/projects.html?category) - [Google Cloud Platform](https://cloud.google.com/docs/) - [Digital Ocean](https://developers.digitalocean.com/documentation/)

Web, API, and DevOps

- [Node.js](https://nodejs.org/en/docs/) - [AngularJS](https://docs.angularjs.org/guide) - [React](https://facebook.github.io/react/docs/hello-world.html) - [Docker](https://docs.docker.com/) - [SaltStack](https://docs.saltstack.com/en/latest/) - [Font Awesome](http://fontawesome.io/) - [Bootstrap](http://getbootstrap.com/) - [Jekyll](https://jekyllrb.com/docs/home/) - [Grunt](http://gruntjs.com/getting-started) - [Gulp](https://github.com/gulpjs/gulp/blob/master/docs/README.md) - [Nginx](https://nginx.org/en/docs/)

Books

- [An Introduction to Statistical Learning](http://www-bcf.usc.edu/~gareth/ISL/index.html) - [The Elements of Statistical Learning](http://statweb.stanford.edu/~tibs/ElemStatLearn/) - [Mastering Predictive Analytics with R](https://www.packtpub.com/application-development/mastering-predictive-analytics-r) - [Machine Learning with R](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r) - [Python Machine Learning](https://www.packtpub.com/big-data-and-business-intelligence/python-machine-learning) - [Data Science for Business](http://data-science-for-biz.com/DSB/Home.html) - [Data Analytics Made Accessible](https://www.amazon.com/Data-Analytics-Made-Accessible-Maheshwari-ebook/dp/B00K2I2JL8) - [Data Smart](http://www.john-foreman.com/data-smart-book.html) - [Predictive Analytics](http://www.cs.columbia.edu/~evs/) - [Real-World Machine Learning](https://www.manning.com/books/real-world-machine-learning)

Tutorials

- [Comparing Git Workflows](https://www.atlassian.com/git/tutorials/comparing-workflows/)

Articles

- [Best machine learning packages in R](http://www.kdnuggets.com/2015/06/top-20-r-machine-learning-packages.html) - [The Data Science Industry: Who Does What (Infographic)](https://www.datacamp.com/community/tutorials/data-science-industry-infographic) - [Data Science Falls Into Many Roles](http://www.forbes.com/sites/rawnshah/2015/10/06/data-science-falls-into-many-roles/) - [What’s the Difference Between Data Science Roles?](https://www.betterbuys.com/bi/comparing-data-science-roles/) - [Data Science Career Paths: Different Roles in the Industry](https://www.springboard.com/blog/data-science-career-paths-different-roles-industry/) - [All the best big data tools and how to use them](https://www.import.io/post/all-the-best-big-data-tools-and-how-to-use-them/) - [DL4J vs. Torch vs. Theano vs. Caffe vs. TensorFlow](https://deeplearning4j.org/compare-dl4j-torch7-pylearn) - [10+ Machine Learning as a Service Platforms](http://www.butleranalytics.com/10-machine-learning-as-a-service-platforms/) - [Python, Machine Learning, and Language Wars](http://sebastianraschka.com/blog/2015/why-python.html) - [What are the pros and cons of offline vs. online learning?](https://www.quora.com/What-are-the-pros-and-cons-of-offline-vs-online-learning) - [Introduction to Online Machine Learning : Simplified](https://www.analyticsvidhya.com/blog/2015/01/introduction-online-machine-learning-simplified-2/) - [Machine Learning From Streaming Data: Two Problems, Two Solutions, Two Concerns, and Two Lessons](https://blog.bigml.com/2013/03/12/machine-learning-from-streaming-data-two-problems-two-solutions-two-concerns-and-two-lessons/) - [Batch vs. Real Time Data Processing](http://www.datasciencecentral.com/profiles/blogs/batch-vs-real-time-data-processing)

Whitepapers

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
articles		articles
book-files		book-files
books		books
cheats		cheats
technical-reference		technical-reference
Architectures - Cloud and big data.md		Architectures - Cloud and big data.md
README.md		README.md
Wikipedia Links.md		Wikipedia Links.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Science, Machine Learning, and Artificial Intelligence Resources

Table of Contents

Blogs

GitHub Repos

Notebooks

Book Resources

Cheats

Web Resources

Datasets

IDEs

Programming Languages and OS

Database and Big Data

Platforms, Libraries, and Packages

Cloud/SaaS/PaaS/IaaS

Web, API, and DevOps

Books

Tutorials

Articles

Whitepapers

About

Releases

Packages

Languages

MiguelSteph/data-science-machine-learning-ai-resources

Folders and files

Latest commit

History

Repository files navigation

Data Science, Machine Learning, and Artificial Intelligence Resources

Table of Contents

About

Resources

Stars

Watchers

Forks

Languages