Skip to content

RStudio::conf(2022) Resources and Materials for the "Making Data Pipelines in R" Talk.

Notifications You must be signed in to change notification settings

Meghansaha/pipelines_in_R

Repository files navigation

Making Data Pipelines in R: A Story From A Self Taught Perspective Resources and Materials on a black and gray blotchy background

Resources for Making Data Pipelines in R

About

This repository contains supplemental resources and materials that coincide with the "Making Data Pipelines in R" talk at RStudio::conf(2022).

"Making Data Pipelines in R" is a story that aims to present high-level concepts that can be useful for creating a data pipeline in R from scratch within the context of the user being self-taught in the R programming language. This context is relevant because any programmer (Especially self-taught programmers) can have knowledge gaps that may make creating automated data pipelines daunting, if not impossible.

This repository can serve as a general learning tool, resource, and source of inspiration for those who want to begin creating data pipelines from scratch in R. Data pipelines is an expansive topic that is fluid and varies by industry, organizational setting, and professional use case. Your mileage may vary. Feel free to use anything in this repository that may help you in your own pipeline adventures!

A visual artistic representation of a data pipeline in stages as you start from creating it, to investigating the data, to building structure into it, adding data validations, and creating and maintaining sustainability.


Slides and Talk Recording

The slides for "Making Data Pipelines in R" can be found on the repository here.


This talk being presented on July 27th 2022, 1:30 PM EST at the Gaylord National Convention Center in National Harbor,(Maryland/D.C) United States. A recording to this talk is available here.



Example (Non-Technical) Documents (First Investigations and External Environment)

Example (Non-Technical) Documents like metadata tables and data workflow diagrams that can be used to disseminate general pipeline information. These documents can be found for modification and download here.



Example R Scripts (Internal Environment)

Example R Projects and scripts can be found on the repository here.

For more information about how to fork,clone, or pull down repositories for your own practice/use on Github, please refer to to this Git Docs Article.

More Complicated Example

For those looking for more complicated scripts that exercises knowledge of intermediate script modularization, custom functions, and script chaining, you may be interested in the example R Project "simple_pipeline" located here

Simpler Example

For those that want an lighter introduction to chaining scripts together without worrying about intermediate knowledge of working directories, you may be interested in the example R project "even_simpler_pipeline" located here



Other Relevant R Resources (Validation and Sustainability)

A breakdown of R documentation, packages, and other references that can be useful for making data pipelines in R that can be found here.

About

RStudio::conf(2022) Resources and Materials for the "Making Data Pipelines in R" Talk.

Resources

Stars

Watchers

Forks