Skip to content

juliaparish/gbif_species_distribution

Repository files navigation

Environmental Data Science 232 Machine Learning Lab 01 Assignment

Overview

This machine learning analysis was completed as an assignment for my Master’s program course, Environmental Data Science 232: Machine Learning. It was assigned by our professor, Dr. Ben Best, as an introduction to machine learning by predicting presence of a chosen species from observations and environmental data found on the Global Biodiversity Information Facility site. It follows guidance found at Species distribution modeling | R Spatial .

My chosen species is coyote brush (Baccharis pilularis). Baccharis pilularis is native to the west coast of the United States (Oregon, California, and Baja California, Mexico). It is a shrub in the Asteraceae (Sunflower) family with oblanceolate to obovate toothed leaves, panicle-like inflorescence with staminate flowers that when mature mimic snow, and generally sticky (not a pun).

Learning Objectives {-}

  • Explore

    • Fetch species observations from the Global Biodiversity Information Facility (GBIF.org) using an R package that wraps a function around their API.
    • Fetch environmental data for defining environmental relationship in the species distribution model (SDM).
    • Generate pseudo-absences, or background, points with which to differentiate from the species presence points in the SDM.
    • Extract underlying environmental data from points.
    • Plot term plots of each environmental predictor with the species response.
    • Pairs plot to show correlation between variables and avoid multicollinearity (see 8.2 Many predictors in a model)
  • Logistic Regression seen as an evolution of techniques

    • Linear Model to show simplest multivariate regression, but predictions can be outside the binary values.
    • Generalized Linear Model uses a logit transformation to constrain the outputs to being within two values.
    • Generalized Additive Model allows for "wiggle" in predictor terms.
    • Maxent (Maximum Entropy) is a presence-only modeling technique that allows for a more complex set of shapes between predictor and response.
  • Decision Trees Use Decision Trees as a Classification technique to the data with the response being categorical (factor(present)).

  • Recursive Partitioning (rpart())
    Originally called classification & regression trees (CART), but that's copyrighted (Breiman, 1984).

  • Random Forest (RandomForest())
    Actually an ensemble model, ie trees of trees.

  • Complete the modeling workflow with the steps to evaluate model performance and calibrate model parameters.

Packages

caret dismo dplyr
DT GADMTools GGally
ggplot2 here htmltools
leaflet maptools mapview
pdp purrr ranger
raster readr rgbif
rgdal rJava rpart
rsample sdmpredictors sf
skimr spocc tidyr
usdm vip

About

Species Distribution Modeling

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages