Environmental Data Science 232 Machine Learning Lab 01 Assignment

Overview

This machine learning analysis was completed as an assignment for my Master’s program course, Environmental Data Science 232: Machine Learning. It was assigned by our professor, Dr. Ben Best, as an introduction to machine learning by predicting presence of a chosen species from observations and environmental data found on the Global Biodiversity Information Facility site. It follows guidance found at Species distribution modeling | R Spatial .

My chosen species is coyote brush (Baccharis pilularis). Baccharis pilularis is native to the west coast of the United States (Oregon, California, and Baja California, Mexico). It is a shrub in the Asteraceae (Sunflower) family with oblanceolate to obovate toothed leaves, panicle-like inflorescence with staminate flowers that when mature mimic snow, and generally sticky (not a pun).

Learning Objectives {-}

Explore
- Fetch species observations from the Global Biodiversity Information Facility (GBIF.org) using an R package that wraps a function around their API.
- Fetch environmental data for defining environmental relationship in the species distribution model (SDM).
- Generate pseudo-absences, or background, points with which to differentiate from the species presence points in the SDM.
- Extract underlying environmental data from points.
- Plot term plots of each environmental predictor with the species response.
- Pairs plot to show correlation between variables and avoid multicollinearity (see 8.2 Many predictors in a model)
Logistic Regression seen as an evolution of techniques
- Linear Model to show simplest multivariate regression, but predictions can be outside the binary values.
- Generalized Linear Model uses a logit transformation to constrain the outputs to being within two values.
- Generalized Additive Model allows for "wiggle" in predictor terms.
- Maxent (Maximum Entropy) is a presence-only modeling technique that allows for a more complex set of shapes between predictor and response.
Decision Trees Use Decision Trees as a Classification technique to the data with the response being categorical (factor(present)).
Recursive Partitioning (rpart())
Originally called classification & regression trees (CART), but that's copyrighted (Breiman, 1984).
Random Forest (RandomForest())
Actually an ensemble model, ie trees of trees.
Complete the modeling workflow with the steps to evaluate model performance and calibrate model parameters.

Packages


caret	dismo	dplyr
DT	GADMTools	GGally
ggplot2	here	htmltools
leaflet	maptools	mapview
pdp	purrr	ranger
raster	readr	rgbif
rgdal	rJava	rpart
rsample	sdmpredictors	sf
skimr	spocc	tidyr
usdm	vip

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Lab01_species_distribution.Rmd		Lab01_species_distribution.Rmd
Lab01_species_distribution.Rproj		Lab01_species_distribution.Rproj
Lab01_species_distribution.html		Lab01_species_distribution.html
Lab01a_explore.Rmd		Lab01a_explore.Rmd
Lab01a_explore.html		Lab01a_explore.html
Lab01b_regression.Rmd		Lab01b_regression.Rmd
Lab01b_regression.html		Lab01b_regression.html
Lab01c_trees.Rmd		Lab01c_trees.Rmd
Lab01c_trees.html		Lab01c_trees.html
Lab01d_evaluate.Rmd		Lab01d_evaluate.Rmd
Lab01d_evaluate.html		Lab01d_evaluate.html
README.html		README.html
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Environmental Data Science 232 Machine Learning Lab 01 Assignment

Overview

Learning Objectives {-}

Packages

About

Releases

Packages

Languages

License

juliaparish/gbif_species_distribution

Folders and files

Latest commit

History

Repository files navigation

Environmental Data Science 232 Machine Learning Lab 01 Assignment

Overview

Learning Objectives {-}

Packages

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages