Skip to content

Latest commit

 

History

History
61 lines (30 loc) · 4.39 KB

other_resources.md

File metadata and controls

61 lines (30 loc) · 4.39 KB

Making Data Pipelines in R: A Story From A Self Taught Perspective Resources and Materials on a black and gray blotchy background

Resources for Making Data Pipelines in R



Working with Directories in R

Validation/Cleaning

As admitted in the Making Data Pipelines in R Talk, I personally learned about the importance of ensuring that my data was clean and as expected by introducing data validation late in my journey. Consequently, I am absolutely not an expert in data validation and am still learning about it. The following resources helped me figure out what I needed to at the time, but are also on my to-do list to get back to and read more thoroughly.

Specific Resources for Performing Data Validations in R

Specific Resources for Cleaning Data in R

General Resources for Data Validation

Sustainability

As mentioned in the Talk, sustainability can look like a lot of different things depending on the context of the pipeline. In my personal case, sustainability meant being able to not only document the pipeline, but also make the code human-readable to non-programmers through non-technical documents. The following are links to versions of my own personal documents I've used, as well as future readings for things that may be helpful when thinking about the sustainability of your pipeline in R.

  • The dataReporter Package (formerly known as dataMaid) by Claus Ekstrøm and Anne Petersen - Useful for generating codebooks and reports on your data.

  • The flowr Package by Sahil Seth- Useful for experimenting with visualizing workflows in R

  • Codebook Template - Useful for thinking about what to put into a codebook. This is a word document. If you'd like, you could recreate this in R Markdown or other programs.

  • Workflow Reference - Useful for inspiration about visualizing and describing workflows in your pipeline. This was originally created in Canva, but any visualization software, or even Microsoft Powerpoint would suffice.

  • Data Map Template - Useful for giving an editable template for visualizing datasets on a more granular level. Similar to SQL schema visualizations. Can be created in any visualization software. Created here in powerpoint for ease of sharing.