Skip to content

Companion code repository for the O'Reilly "Python and R for the Modern Data Scientist" book.

Notifications You must be signed in to change notification settings

moderndatadesign/PyR4MDS

Repository files navigation

Python and R for the Modern Data Scientist Code

Welcome

Welcome to the companion code repository for the O'Reilly book Python and R for the Modern Data Scientist. You can also access this repository as an RStudio Cloud project (account required).

Success in data science depends on the flexible and appropriate use of tools. That includes Python and R, two of the foundational programming languages in the field. With this book, data scientists from the Python and R communities will learn how to speak the dialects of each language. By recognizing the strengths of working with both, you'll discover new ways to accomplish data science tasks and expand your skill set.

Authors Rick J Scavetta and Boyan Angelov explain the fundamentals of these languages and highlight where each one excels over the other, whether it's their linguistic features or the power of their open source ecosystems. Not only will you learn how to use Python and R together in real-world settings, but you'll also broaden your knowledge and job opportunities by working as a bilingual data scientist.

  • Learn Python and R from the perspective of your current language
  • Understand the strengths and weaknesses of each language
  • Identify use cases where one language is better suited than the other
  • Understand the modern open source ecosystem available for both, including packages, frameworks, and workflows
  • Learn how to integrate R and Python in a single workflow
  • Follow a real-world case study that demonstrates ways to use these languages together

Repository structure

When available, companion scripts to the book are found in their respective chapter directories.

Part II. Levels of working together I: Bilingual

Part III. Modern Context

Part IV. Levels of working together II: Synergy

Appendix A. Bilingual Dictionary

Datasets

Datasets used in the book can be found as follows.

Diamonds

This dataset is from the R ggplot2 package:

library(ggplot2)
data(diamonds)

Iris & Plant Growth

These are available in base R:

data(PlantGrowth)
data(iris)

Boston housing

This dataset is available in using the Python scikit-learn package:

from sklearn.datasets import load_boston
boston_data = load_boston()

Amazon music reviews

The Amazon music review data can be downloaded here. We use the "digital music" subset.

Swimming pool and car detection

This dataset on swimming pool and car detection using satelite imagery is available on Kaggle.

Daily Australian Temperatures

The daily australian temperatures dataset can be dowloaded directly from Github.

Loxodonta Africana species occurence data

Obtain this data and the spatial raster (the bioclimactic varialbes) using the R sdmbench package:

library(sdmbench)
data <- get_benchmarking_data("Loxodonta africana")

This object is a list and contains the occurence data in data$df_data and the raster layers in data$raster_data.

Shared cars locations data

These data can be downloaded from Kaggle.

Wildfires

The wildfires data can be downloaded from the USDA website directly or from Kaggle. To run the case study, add the file FPA_FOD_20170508.sqlite to the ch07-case-study/data/ folder.

Star Wars

This dataset is from the R dplyr package:

library(dplyr)
data(starwars)