GitHub - RLadiesMadrid/H2O_Workshop: H2O Workshop for WeCode 2018

About H2O

In H2O Docs

About this workshop

WeCodeFest slides

Requirements

About the algorithms

Generalized Linear Models (GLM)

In H2O Docs

Introduction to Generalized Linear Models

Demo H2O World

Generalized Linear Models (GLM) estimate regression models for outcomes following exponential distributions. In addition to the Gaussian (i.e. normal) distribution, these include Poisson, binomial, and gamma distributions. Each serves a different purpose, and depending on distribution and link function choice, can be used either for prediction or classification.

Options

Datasets are commonly split into training, testing, and validation sets.
- A training dataset is a dataset of examples used for learning, that is to fit the parameters of, for example, a classifier.
- A validation dataset is a set of examples used to tune the hyperparameters of a classifier. It, as well as the testing set, should follow the same probability distribution as the training dataset.
- A test dataset is a dataset that is independent of the training dataset, but that follows the same probability distribution as the training dataset.
K-fold cross-validation is used to validate a model internally, i.e., estimate the model performance without having to sacrifice a validation split. Also, you avoid statistical issues with your validation split (it might be a “lucky” split, especially for imbalanced data). Good values for K are around 5 to 10. Comparing the K validation metrics is always a good idea, to check the stability of the estimation, before “trusting” the main model.
Seed: This option specifies the random number generator (RNG) seed for algorithms that are dependent on randomization. When a seed is defined, the algorithm will behave deterministically.

Word2vec

In H2O Docs

The Word2vec algorithm takes a text corpus as an input and produces the word vectors as output. The algorithm first creates a vocabulary from the training text data and then learns vector representations of the words. The vector space can include hundreds of dimensions, with each unique word in the sample corpus being assigned a corresponding vector in the space. In addition, words that share similar contexts in the corpus are placed in close proximity to one another in the space.

Vignettes

GLM Booklet R Vignette.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Flights		Flights
.gitignore		.gitignore
H2O.Rproj		H2O.Rproj
README.md		README.md
install.R		install.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About H2O

About this workshop

About the algorithms

Generalized Linear Models (GLM)

Word2vec

Vignettes

About

Releases

Packages

Contributors 2

Languages

RLadiesMadrid/H2O_Workshop

Folders and files

Latest commit

History

Repository files navigation

About H2O

About this workshop

About the algorithms

Generalized Linear Models (GLM)

Word2vec

Vignettes

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages