-
Notifications
You must be signed in to change notification settings - Fork 26
/
Copy pathsyllabus.tex
268 lines (203 loc) · 23.4 KB
/
syllabus.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
\documentclass[12]{article}
%\input{/home/grant/Dropbox/LaTeX/preamble} %% Rather use self-contained preamble below
%% LAYOUT AND TITLES
\usepackage{setspace}
\onehalfspacing
\usepackage[margin=1.1in]{geometry}
\setlength{\parindent}{0pt}
\setlength{\parskip}{10pt}
\usepackage{titling}
\newcommand{\subtitle}[1]{%
\posttitle{
\par\end{center}
\begin{center}\large#1\end{center}
\vskip0.5em}
}
%% Change title format to be more compact
%\usepackage{titling}
%\setlength{\droptitle}{-2em}
% \title{Syllabus}
% \pretitle{\vspace{\droptitle}\centering\huge}
% \posttitle{\par}
% \author{Grant R. McDermott}
% \preauthor{\centering\large\emph}
% \postauthor{\par}
% \predate{\centering\large\emph}
% \postdate{\par}
% \date{}
%% FONTS
\usepackage[normalem]{ulem} %% For strikeout font: \sout()
\usepackage{lmodern}
\usepackage{amssymb, amsmath}
\usepackage{fontspec}
% % See: https://tex.stackexchange.com/a/50593
\setmainfont[]{Fira Sans}
\setsansfont[]{Fira Sans}
\setmonofont[]{Fira Mono}
% \setmonofont[Mapping=tex-text]{inconsolata}
\defaultfontfeatures{
Path = /usr/share/texmf-dist/fonts/opentype/public/fontawesome/ }
\usepackage{fontawesome} % Ditto
%% MISC
\usepackage[colorlinks = true,
linkcolor = black,
urlcolor = blue,
citecolor = blue,
anchorcolor = black]{hyperref}
\usepackage{tabularx}
\usepackage{booktabs}
\begin{document}
\title{Data science for economists \\(EC 607)}
\subtitle{\textsc{Winter 2021 syllabus}\vspace{-2ex}}
\author{Grant R. McDermott\\ Dept. of Economics, University of Oregon}
%\date{} % Toggle commenting to test
\date{\vspace{-5ex}}
\maketitle
\section*{Summary}
\begin{tabular}{ll}
\textbf{When:} & Tue \& Thu, 10:15--11:45 \\
% \textbf{Where:} & PLC 410 \\
\textbf{Where:} & Remote! A Zoom link will be sent to you. \\
\textbf{Web:} & \href{https://github.com/uo-ec607}{https://github.com/uo-ec607} \\
\textbf{Who:} & Grant McDermott \\
& \, \faMortarBoard \, Assistant Professor of Economics \\
& \, \faEnvelopeO \, \href{mailto:grantmcd@uoregon.edu}{grantmcd@uoregon.edu} \\
& \, \faHourglassHalf \, Mon \& Wed, 09:00--10:30 \\
\end{tabular}
\section*{Course description}
This seminar is targeted at economics PhD students and will introduce you to the modern data science toolkit. While some material will likely overlap with your other quantitative and empirical methods courses, this is not just another econometrics course. Rather, my goal is bring you up to speed on the practical tools and techniques that I feel will most benefit your dissertation work and future research career. This includes many of the seemingly forgotten skills --- like where to find interesting data sets in the ``wild'' and how to actually clean them --- that are crucial to any successful scientific project, but are typically excluded from core econometrics and statistics classes. We will cover topics like version control and effective project management; programming; data acquisition (e.g. web-scraping), cleaning and visualization; GIS and remote sensing products; and tools for big data analysis (e.g. relational databases, cloud computation and machine learning). In short, we will cover things that I wish someone had taught me when I was starting out in graduate school. %While the data sets and materials focus will predominantly link to environmental and natural resource issues (my own fields of specialisation), the tools and methods apply broadly. Students from other fields of specialisation are thus welcome to register.
\newpage
\section*{Practical matters}
\subsection*{Class rules}
\sout{Please bring your laptops to class. This will be a very hands-on course, with relatively little in the way of formal theory. Instead, we'll be working through lecture notes together in class and you'll be running code on your own machines.} \textbf{Update:} With COVID-19 pushing us to remote classes, I'll be changing how I teach this course. The most important change is that I'll be delivering lectures \textit{asynchronously}, essentially flipping the classroom. My expectation is that you'll watch and work through the pre-recorded lecture videos before class. We'll reserve actual class time for two things: 1) student presentations and 2) troubleshooting and follow-up from any of the lecture material. We may need to adapt as the quarter develops, but that's what we'll start out with and stay flexible.
\subsection*{Software requirements}
All of the software requirements for this course are open-source and/or free. Please aim to have everything installed by the start of our first lecture. I will be available for installation troubleshooting during the first week of the quarter. If you want a detailed tutorial on how to achieve a perfect working setup, I can think of no finer guide than Jenny Bryan \textit{et al}.'s \url{http://happygitwithr.com/} (see esp. sections 4 -- 15).
\vspace{-0.25cm}
\subsubsection*{\textit{R} and RStudio}
We will mainly be using the statistical programming language \textbf{\textit{R}} (download \href{https://www.r-project.org/}{here}). %Indeed, at one level, this seminar could be seen as a crash-course in learning and using \textit{R} to answer economics-based questions.
Please make sure that you install the \textbf{RStudio IDE} too (download \href{https://www.rstudio.com/products/rstudio/download/preview/}{here}).
\vspace{-0.25cm}
\subsubsection*{Git and GitHub Classroom}
We will also make extensive use of the \textbf{Git} version control system (follow the OS-specific installation instructions \href{http://happygitwithr.com/install-git.html}{here}). Once you have installed Git, please create an account on \textbf{GitHub} (\href{https://github.com/join}{here}) and register for an education discount to get unlimited private repos (\href{https://education.github.com/discount_requests/new}{here}).\footnote{GitHub recently \href{https://blog.github.com/changelog/2019-01-08-pricing-changes/}{announced} unlimited free private repos for everyone. However, you are limited to three collaborators per private repo, so the education discount still makes sense.} Now is probably a good time to tell you that I am going to run the course through \href{https://classroom.github.com/}{GitHub Classroom}. You will receive an email invitation to the course repo with instructions in due time, but suffice it to say that this is how we'll submit assignments, provide feedback, receive grades, etc.
\vspace{-0.25cm}
\subsubsection*{Other}
You are ready to start this course once you have installed R, RStudio, and Git (as well as created an account on GitHub). The last thing I want you to do for now is make sure that your system is configured to handle some additional packages that we will be using down the line. This varies by operating system:
\begin{itemize}
\item \textbf{Linux:} You should be good to go.
\item \textbf{Mac:} Install the \href{https://brew.sh/}{Homebrew} package manager. I also recommend that you make sure your C++ toolchain is configured/open. Don't worry, it's simpler than it sounds. Just download the \href{https://github.com/rmacoslib/r-macos-rtools#installer-package-for-macos-r-toolchain-}{macOS Rtools installer} and follow the instructions.
\item \textbf{Windows:} Install \href{https://cran.r-project.org/bin/windows/Rtools/}{Rtools}. While its not essential, I also recommend that you install the \href{https://chocolatey.org/}{Chocolatey} package manager for Windows.
\end{itemize}
I will provide instructions for any further software requirements as the need arises; i.e. when we get to the relevant lecture. On that note, the lectures have all been posted ahead of time on the \href{https://github.com/uo-ec607}{course website}. Each lecture lists all of the \textit{R} packages and external libraries (if relevant) required for a particular class. I'll try to remind you, but my expectation is that you will look at these requirements and ensure that you have them installed \textit{before} we start class.
\subsection*{Textbook and other readings}
There's no set textbook for this course (Ed Rubin and I are working on one). The lecture notes are pretty detailed and are thus ``self-contained''. However, I've drawn inspiration from various sources; a few of which are listed below. You don't \textit{need} to buy or read any of these (excellent) books to complete the course. But I can eagerly recommend leafing through at least one or two of them. Each of these books is freely available online if you can't afford a hard copy:
%
\begin{itemize}
\item ``\href{http://socviz.co/}{\textbf{Data Visualization: A practical introduction}}'' (Kieran Healy)
\item ``\href{http://r4ds.had.co.nz}{\textbf{\textit{R} for Data Science}}'' (Garrett Grolemund and Hadley Wickham)\footnote{FWIW, Jake VanderPlas's ``\href{https://jakevdp.github.io/PythonDataScienceHandbook/}{\textbf{Python Data Science Handbook}}'' is excellent option for anyone looking for a Python equivalent.}
\item ``\href{https://adv-r.hadley.nz/}{\textbf{Advanced \textit{R}}}'' (Hadley Wickham)
\item ``\href{https://geocompr.robinlovelace.net/}{\textbf{Geocomputation with \textit{R}}}'' (Robin Lovelace, Jakub Nowosad and Jannes Muenchow)
\item ``\href{https://keen-swartz-3146c4.netlify.app/}{\textbf{Spatial Data Science}}'' (Edzer Pebesma and Roger Bivand)
\item ``\href{https://statlearning.com}{\textbf{An Introduction to Statistical Learning}}'' (Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani)
\item Etc.
\end{itemize}
% The nearest thing to a conventional textbook for this course is probably Garrett Grolemund and Hadley Wickham's ``\href{http://r4ds.had.co.nz}{\textbf{\textit{R} for Data Science}}'' (R4DS). I have ordered some copies at the Duck Store, but the book is available in its entirety for free online. I highly recommend this book for anyone who is interested in using \textit{R} for their research.\footnote{For those of you who prefer Python to \textit{R}, Jake VanderPlas's ``\href{https://jakevdp.github.io/PythonDataScienceHandbook/}{\textbf{Python Data Science Handbook}}'' is another excellent option.} Which, let's be honest, you should be. %Only dinosaurs are using Stata now. (Don't tell the other professors. Actually, who am I kidding: TELL THEM.)
%
% In truth, R4DS will mostly cover the introductory parts of this course, Other books that I eagerly recommend and will be drawing on occasionally include ``\href{https://adv-r.hadley.nz/}{\textbf{Advanced \textit{R}}}'' (Hadley Wickham, again), ``\href{http://socviz.co/}{\textbf{Data Visualization: A practical introduction}}'' (Kieran Healy), and ``\href{https://geocompr.robinlovelace.net/}{\textbf{Geocomputation with \textit{R}}}'' (Robin Lovelace, Jakub Nowosad and Jannes Muenchow). These books are all freely available online too. I may also refer you to the \href{http://stat545.com/topics.html}{\textbf{STAT 545 website}}, which is a course initially taught at UBC by Jenny Bryan and continues to serve as an incredible knowledge resource for all things related to \textit{R} and reproducible research. Finally, if we get enough time to take a deep dive into machine learning, then I'll be drawing from ``\href{https://web.stanford.edu/~hastie/ElemStatLearn/}{\textbf{The Elements of Statistical Learning}}'' (Trevor Hastie, Robert Tibshirani, and Jerome Friedman), which is a classic and (surprise!) also available as a free PDF online.\footnote{A new book that I really like the look of is ``\href{https://bradleyboehmke.github.io/HOML/}{\textbf{Hands-On Machine Learning with \textit{R}}}'' (Boehmke and Greenwell).}
Taking a step back, one of the goals of this course is to make you aware of the incredible array of instruction material that is freely available online. I also want to encourage you to be entrepreneurial. In that spirit, many of the lectures will follow a tutorial on someone's blog tutorial, or involve reproducing an existing study with open source tools. Each lecture will come with a set of recommended readings, which I expect you to at least look over before class.
%\newpage
\section*{Evaluation and grading}
\subsection*{Grade determination}
%Grades will be determined according to a mix of regular assignments and in-class presentations. You will also be expected to evaluate each others' code and provide constructive feedback for improvements. There will be no final exam, although you may be asked to give a final presentation on a topic TBD.
Grades will be determined as follows:
\begin{table}[!h] \centering
%\caption{\textsc{grades} }
\label{tab:grades}
\begin{tabularx}{0.5\textwidth}{Xr}
\toprule
% \multicolumn{2}{c}{EC 607} \\
% \midrule
4 $\times$ homework assignments (20\% each) & 80\% \\
1 $\times$ short presentations & 10\% \\
1 $\times$ OSS contribution & 10\% \\
\bottomrule
\multicolumn{2}{>{\hsize=\dimexpr1\hsize+6\tabcolsep}X}{\footnotesize Note: A class participation bonus worth an additional 2.5\% will be awarded at my discretion.}\\
\end{tabularx}
\end{table}
This breakdown should (hopefully) be pretty self-explanatory. Specific requirements will be made clear as we proceed through the course. Here are some additional details, though:
\vspace{-0.25cm}
\subsubsection*{Homework assignments (and/or final presentation)}
Homework assignments are to be completed individually. Late submissions will not be graded. There is no final exam or project for this course. However, you have the option of swapping out one of the individual homework assignments for a final (20 min) presentation of your own research. Think of this as an opportunity to develop and refine one of your PhD projects using the tools that we will cover in this course. In particular, some of you may wish to present your second-year field paper, or a dissertation chapter idea. You are allowed to do this individually or in pairs. However, please note the following caveats: 1) You need to get prior approval from me and let me know which HW assignment you are dropping. 2) These final presentations will only be graded on content relevant to this course. (Don't present a theory paper!)
\vspace{-0.25cm}
\subsubsection*{Short presentations}
Almost every lecture will begin with a short student presentation. These should last 5--10 minutes and will cover a prescribed topic (i.e. that I have either allocated or approved ahead of time). Some presentations will involve summarising a key reading or topic of relevance to the main lecture for that day. Other presentations are a chance for you to describe a software package or analytical method of your choice --- again, subject to my approval. I will provide a list of slots, as well as prescribed and suggested topics via GitHub Classroom. Topics will be assigned on a first-come-first-go basis. But don't be surprised if I volunteer you for something.
%Most lectures have one or more key readings; see the \nameref{sec:outline} at the end of this document. Each of you must give a short (5-10 min) summary presentation on at least one of these key readings. I say ``at least one'' because --- while you will need to give two short presentations in total --- you also have the option to present on an (approved) software package or tool of your choice.\footnote{I'll provide a list of some suggested packages and tools on the course repo.} Topics will be assigned on a first-come-first-go basis. But don't be surprised if I volunteer you for something.
\vspace{-0.25cm}
\subsubsection*{OSS contribution}
You are going to contribute to open-source software (OSS) in some way, shape, or form. This could be by identifying and correcting bugs in a package that you use. Or, it could be by contributing material (e.g. documentation) to an open-source project. I particularly want to encourage you to contribute to the Library of Statistical Techniques (\url{https://lost-stats.github.io/}). There's clearly quite a bit of leeway here and I'll need to sign off on whatever you propose. Similarly, depending on the scope and size, you may need to make several different contributions to fulfill the requirement.
%You are going to peer-review (or reproduce) a study, project or software package. The focus here is on code and analysis, rather than framing or narrative issues. How exactly I expect you to do this will become clear after the first few lectures. The gist is that you will be using GitHub and related tools. (E.g. Cloning or forking a repo, identifying bugs or missing dependencies, issuing pull requests, and so forth. Again, these terms will make more sense once we cover them in class.) An approach that worked well last year --- but depends on demand for final presentations --- is that students reviewed each others' field papers. You could also choose to review any open-source project or repo, including \href{https://github.com/grantmcdermott?tab=repositories}{my own}. You will have 5 minutes to present your main findings/contributions and will also need to share any code changes/contributions with me.
\subsection*{Honesty and academic integrity}
Students caught cheating or plagiarizing will automatically be assigned a zero grade. Please acquaint yourself with the Student Conduct Code at \url{http://studentlife.uoregon.edu}.
\subsection*{Accessibility}
If you have a documented disability and anticipate needing accommodations in this course, please make arrangements with me during the first week of the term. Please also request that the \href{https://aec.uoregon.edu/}{Accessible Education Center} send me a letter verifying your disability. Students with infants or young children that need ongoing care should similarly come and see to me. We'll have to take it on a case-by-case basis, but I'll do my utmost to accommodate you.
\newpage
\section*{Lecture outline}
\label{sec:outline}
%\textit{Note: Key readings in italics. $^* = $ potential short presentation topic.}
\subsection*{Data science basics}
\begin{enumerate}
\item Introduction: Motivation, software installation, and data visualization
\item Version control with Git(Hub)
\item Learning to love the shell
\item \textit{R} language basics
\item Data cleaning and wrangling: 1) tidyverse and 2) data.table
\item Webscraping: (1) Server-side and CSS
\item Webscraping: (2) Client-side and APIs
\end{enumerate}
\subsection*{Analysis and programming}
\begin{enumerate}
\setcounter{enumi}{7}
\item Regression analysis in \textit{R}
\item Spatial analysis in \textit{R}
\item Functions in \textit{R}: (1) Introductory concepts
\item Functions in \textit{R}: (2) Advanced concepts
\item Parallel programming
\end{enumerate}
\subsection*{Scaling up: Big data and cloud computation}
\begin{enumerate}
\setcounter{enumi}{12}
\item Docker
\item Virtual machines / cloud servers (Google Compute Engine)
\item High performance computing (UO Talapas cluster)
\item Databases: SQL(ite) and BigQuery
\item Spark
\item Machine learning: (1)
\item Machine learning: (2)... Or, student project presentations (demand dependent)
\item Peer-review and student project presentations (demand dependent)
\end{enumerate}
\newpage
\section*{FAQ}
\vspace{-0.25cm}
\subsubsection*{This course looks interesting! Can I use/adapt your lecture notes for a similar course that I'm teaching at XYZ?}
Sure. I've benefited greatly from other people making their teaching materials publicly available (and have tried my best to acknowledge them directly in the relevant sections of this course). Say nothing of the incredible open-source software that powers everything. I'm more than happy to pay it forward. I only ask two favours. 1) Please let me know (\href{mailto:grantmcd@uoregon.edu}{email}/\href{https://twitter.com/grant_mcdermott}{Twitter}) if you do use material from this course, or have found it useful in other ways. 2) A minor acknowledgment somewhere in your own syllabus or notes would be much appreciated.
\vspace{-0.25cm}
\subsubsection*{The other data science courses that I've seen all have at least one whole lecture dedicated to data visualization. Where's yours?}
Every lecture in this course is dedicated to data visualization! Okay, seriously, we'll cover the basics of \href{https://ggplot2.tidyverse.org/}{ggplot2} in the opening lecture (and first assignment) and consistently build upon that in subsequent weeks. Much as I'm tempted to have a standalone lecture on the topic, I have to triage because of the time constraints of a 10-week course. I don't want to run out of road before we can get to some of the big data stuff towards the end of the course. Trust me, though. There will be a \textit{lot} of data visualization in this course.
\vspace{-0.25cm}
\subsubsection*{What about regular expressions? I hear those are super important too.}
100\% agree and, much like data visualization, I've tried to include examples throughout the course rather than in a standalone lecture. I'm confident that you will have a solid grip of the basics by the time we get to the end of the quarter.
\vspace{-0.25cm}
\subsubsection*{I hear that data scientists use Bayesian methods a lot. Will you be covering those in depth?}
Sadly, no. I'm a Bayes fanboy (as my research interests will attest), but again have to think about time constraints. The good news is that running Bayesian models in \textit{R} is super easy thanks to a multitude of packages, and you will be very well positioned to jump right into these after finishing this course. We might even get to an example or two in the lecture on regression analysis. The even better news is that \href{https://pages.uoregon.edu/jpiger/}{Jeremy Piger} teaches an excellent Bayesian course here at the UO that you should attend.
\vspace{-0.25cm}
\subsubsection*{Is there anything else that you aren't covering that I should know about?}
The obvious thing that springs to mind is workflow automation and analysis pipelines (make files, etc.). Again, triage rules the day. We will, however, be working extensively with R Markdown documents, which is at least a big step in the direction of self-contained analysis. And I'm more than happy to point students in the right direction if anyone wants to learn more. (\href{http://stat545.com/Classroom/notes/cm109.nb.html}{Here}, \href{https://ropenscilabs.github.io/drake-manual/index.html}{here}, and \href{https://web.stanford.edu/~gentzkow/research/CodeAndData.pdf}{here} are great places to start.) Another thing we won't have time for is package development and maintenance, although I don't see this class as the primary audience for that. OTOH, students will be rewarded for package contributions if they choose to do so in the peer-review section of the course.
\vspace{-0.25cm}
\subsubsection*{\textit{R} looks cool, but I'm more familiar with Python/Julia/etc. Can I use that instead?}
Short answer: No. Longer answer: Look, I like and use those languages too, but I'm not changing my lecture notes or assignment templates for you. Plus, I really do think that \textit{R} makes the most sense for applied economists looking to develop their data science skills. It already has all of the statistics and econometrics support, and is amazingly adaptable as a ``glue'' language to other programming languages and APIs. Learning multiple languages is never a bad idea in the long run, though.
\vspace{-0.25cm}
\subsubsection*{I already have a BitBucket/GitLab/etc. account. Do I still have to use GitHub?}
Since I'm running this course through GitHub Classroom, yes. But good for you! (Seriously... those are great platforms too and as an open-source advocate, I fully support a plurality of tools and software options.)
\vspace{-0.25cm}
\subsubsection*{On that note, do you have any advice for running a course on GitHub Classroom?}
I mostly followed \href{https://github.com/jfiksel/github-classroom-for-teachers}{this excellent tutorial} by Jacob Fiksel.
\end{document}