forked from ITSLeeds/TDS
-
Notifications
You must be signed in to change notification settings - Fork 0
/
catalogue.Rmd
215 lines (151 loc) · 14.3 KB
/
catalogue.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
---
bibliography: tds.bib
output: github_document
# output: word_document
---
# Catalogue
This is the module catalogue for [TRAN5340M](http://webprod3.leeds.ac.uk/catalogue/dynmodules.asp?Y=202021&M=TRAN-5340M), Transport Data Science
15 credits
Module manager: Dr Robin [Lovelace](mailto:r.lovelace@leeds.ac.uk)
Taught: Semester 2
Year running: 2020/2021
[View Timetable](http://timetable.leeds.ac.uk/teaching/201819/reporting/Individual?objectclass=module&idtype=name&identifier=TRAN5340M01&&template=SWSCUST+module+Individual&days=1-7&weeks=1-52&periods=1-21)
This module is not approved as an Elective
## Module summary
The quantity, diversity and availability of transport data is increasing rapidly, requiring skills in the management and interrogation of data and databases.
We are in the midst of an accelerating data revolution and data science have never been more important in the work place.
Transport planners, consultancies and researchers increasingly need to take data from a wide range of sources and perform non-standard analyses methods on them to inform the decision-making process.
There is high demand for "skilled technical talent capable of handling and analysing very large datasets compiled from multiple sources" in the transport sector according to the [Transport Systems Catapult](https://ts.catapult.org.uk/news-events-gallery/news/report-warns-of-uk-skills-shortage-in-im-sector/).
This module teaches the skills needed to help get a high paid job in this context.
More importantly, you will be empowered to support sustainable transport policies using new techniques and datasets, ranging from openly available origin-destination datasets to huge datasets from global positioning systems (GPS).
Not only does the module teach data skills, it also teaches the importance of understanding how advanced data analysis, modelling and visualisation can support the global transition away from fossil fuels.
## Objectives
- Understand the structure and origin of large transport datasets
- Proficiency in command-line tools for handling large transport datasets
- Knowledge of obtaining, cleaning and storing transport data
- Produce data visualizations, static and interactive
- Apply advanced methods, including geographic analysis, modelling or machine learning techniques
- Consolidate transport data science skills in a cohesive project with real-world relevance
<!-- - Learn how to use these skills in the wider context of a dynamic team environment, including project set-up (e.g. deciding between cloud services and local computing power) and problem definition; team workflow; sourcing and interrogating data; applying statistical methods; visualising data and communicating the results (e.g. via the deployment of web services). -->
## Learning outcomes
- Become confident and skilled working with large and diverse transport datasets
- Ability to identify appropriate datasets and use 'the right tool for the job' from the data scientist's toolkit to answer transport research questions
- Understanding the landscape of transport data science applications and job market and be able to apply data science skills for applied transport planning, research and consultancy projects
<!-- Specifically, learning outcomes will include the ability to: -->
<!-- - Identify available datasets and access and clean them -->
<!-- - Combine datasets from multiple sources -->
<!-- - Understand what machine learning is, which problems it is appropriate for compared with traditional statistical approaches, and how to implement machine learning techniques -->
<!-- - Visualise and communicate the results of transport data science, and know about setting-up interactive web applications -->
<!-- - Deciding when to use local computing power vs cloud services -->
<!-- - Articulate the relevance and limitations of data-centric analysis applied to transport problems, compared with other methods -->
## Syllabus
- Software for practical data science
- **The structure of transport data** e.g. flows, incidents, origin/destination, GIS
- Data cleaning and subsetting
- Accessing data from web sources
<!-- - Route assignment with remote routing services and locally installed software -->
- Processing data using remote services and locally installed software
- Data visualization
- Machine learning
- Professional and ethical issues of big data in transport Transport data analysis
## Subject specific skills
Students will gain skills in:
- Importing a range of transport data file formats from the command-line
- Setting-up data science projects to ensure reproducibility
- Data cleaning and manipulation
- Visualisation of large datasets
<!-- - Routing on road networks -->
## Teaching methods
<p><table border="1" width="100%"><tr><td align="left" width="40%">Delivery type</td><td align="right" width="20%">Number</td><td align="right" width="20%">Length hours</td><td align="right" width="20%">Student hours</td></tr>
<!-- Delivery type | Number | Length hours | Student hours -->
<TR><TD>Lecture</TD><TD align="right">5</TD><TD align="right">1.00</TD><TD align="right">5.00</TD></TR>
<TR><TD>Practical</TD><TD align="right">5</TD><TD align="right">3.00</TD><TD align="right">15.00</TD></TR>
<TR><TD>Seminar</TD><TD align="right">5</TD><TD align="right">1.00</TD><TD align="right">5.00</TD></TR>
<!-- Totals -->
<tr><td colspan="3">Private study hours</td><td align="right">125.00</td></tr><tr><td colspan="3">Total Contact hours</td><td align="right">25.00</td></tr><tr><td colspan="3">Total hours (100hr per 10 credits)</td><td align="right">150.00</td></tr></table>
## Private study
Students are expected to spend their study time on software set-up and worked examples, plus background reading for lectures, preparatory work for workshops and assessed coursework. Unsupervised teamwork practical sessions will be arranged to ensure a complete portfolio.
## Rationale for teaching and learning methods and relationship to learning outcomes
There is a blend of lectures, seminars and practical sessions designed to fulfil the learning outcomes.
A combination of lectures and seminars will be used to teach and gain understanding of the core skills and principles e.g. database design. This will be supported and augmented by independent learning.
A key aim of the module is to gain hands-on experience designing, implementing and communicating data analysis workflows during supervised practical sessions, and consolidated in unsupervised practical sessions.
## Monitoring of student progress
Progress will be monitored informally during supervised practical sessions.
Formative feedback will be given after interim informal assessment of their in-progress project portfolio (week 4/5)
## Methods of assessment
**Coursework**
The project portfolio will be a short (3000 word limit, 10 pages max, excluding appendices) but high quality report explaining the methods learned and their application to real problems.
Students are encouraged to submit reproducible code alongside the report.
This will be used as the basis for formative assessment mid-way through the semester.
This will highlight if any students are struggling.
If a portfolio fails the assessment criteria the student will have an opportunity to resubmit a report outlining what they have learned in the areas in which they are failing.
<table border="1" width="100%"><tr><td valign="top" align="left" width="17%">Assessment type</td><td valign="top" align="left" width="65%">Notes</td><td valign="top" align="right" width="18%">% of formal assessment</td></tr><tr><td valign="top" align="left" width="17%">Portfolio</td><td valign="top" align="left" width="65%">Project Portfolio</td><td valign="top" align="right" width="18%">100.00</td></tr><tr><td colspan="2" valign="top" align="left">Total percentage (Assessment Coursework)</td><td valign="top" align="right">100.00</td></tr></table>
<!-- 20% of the mark for the assessment of the portfolio will be determined as an initial assessment after week 5. Feedback will be provided. The remaining 80% will be awarded in respect of the final submitted portfolio. -->
### Other information about assessment
The project portfolio will be used as the basis for formative assessment mid-way through the semester.
This will highlight if any students are struggling.
If a portfolio fails the assessment criteria the student will have an opportunity to resubmit a report outlining what they have learned in the areas in which they are failing.
## Pre-requisites
### Software
For this module you need to have up-to-date versions of R and RStudio installed on a computer you have access to:
- R from [cran.r-project.org](https://cran.r-project.org/)
- RStudio from [rstudio.com](https://rstudio.com/products/rstudio/download/#download)
- R packages, which can be installed by opening RStudio and typing `install.packages("stats19")` in the R console, for example.
You should have the latest stable release of R (4.0.0 or above) running R on a decent computer, with at least 4 GB RAM and ideally 8 GB or more RAM.
See [Section 1.5 of the online guide Reproducible Road Safety Research with R](https://itsleeds.github.io/rrsrr/introduction.html#installing-r-and-rstudio) for instructions on how to install key packages we will use in the module.^[
For further guidance on setting-up your computer to run R and RStudio for spatial data, see these links, we recommend
Chapter 2 of Geocomputation with R (the Prerequisites section contains links for installing spatial software on Mac, Linux and Windows): https://geocompr.robinlovelace.net/spatial-class.html and Chapter 2 of the online book *Efficient R Programming*, particularly sections 2.3 and 2.5, for details on R installation and [set-up](https://csgillespie.github.io/efficientR/set-up.html) and the
[project management section](https://csgillespie.github.io/efficientR/set-up.html#project-management).
]
It is also recommended that you have installed and have experience with GitHub Desktop (or command line git on Linux and Mac), Docker, Python, QGIS and transport modelling tools such as SUMO and A/B Street.
These software packages will help with the course but are not essential.
### Data science experience
Attending the Introduction to R one-off 3 hour workshop (semester 1 Computer Skills workshop) and experience of using R (e.g. having used it for work, in previous degrees or having completed an online course) is essential.
Students can demonstrate this by showing evidence that they have worked with R before, have completed an online course such as the first 4 sessions in the RStudio Primers series https://rstudio.cloud/learn/primers or DataCamp’s Free Introduction to R course: https://www.datacamp.com/courses/free-introduction-to-r.
This is an advanced and research-led module.
Evidence of substantial programming and data science experience in previous professional or academic work, in languages such as R or Python, also constitutes sufficient pre-requisite knowledge for the course.
## Reading list
<!-- The [reading list](http://lib5.leeds.ac.uk/rlists/broker/index.php?mod=TRAN5340M) is available from the Library website -->
## Essential
- Paper on the **stplanr** paper for transport planning (available [online](https://cran.r-project.org/web/packages/stplanr/vignettes/stplanr-paper.html)) [@lovelace_stplanr_2017]
- Introductory and advanced content on geographic data in R, especially the [transport chapter](http://geocompr.robinlovelace.net/transport.html) (available free [online](http://geocompr.robinlovelace.net/)) [@lovelace_geocomputation_2018]
- Paper on analysing OSM data in Python (available [online](https://arxiv.org/pdf/1611.01890)) [@boeing_osmnx_2017]
## Core
- Introduction to data science with R (available free [online](http://r4ds.had.co.nz/)) [@grolemund_r_2016]
- Introductory textbook introducing machine learning with lucid prose and worked examples in R (available free [online](http://www-bcf.usc.edu/~gareth/ISL/index.html)) [@james_introduction_2013]
## Optional
- Book on transport data science in Python [@fox_data_2018]
- For context, a report on the 'transport data revolution' [@transport_systems_catapult_transport_2015]
- Seminal text on visualisation (available [online](https://github.com/yowenter/books/blob/master/Design/Edward%20R%20Tufte%20-The%20Visual%20Display%20of%20Quantitative%20Information.pdf), style available in the [tufte](https://github.com/rstudio/tufte) R package) [@tufte_visual_2001]
- A paper on the use of SmartCard data [@gschwender_using_2016]
- An academic paper describing the development of a web application for the Department for Transport [@lovelace_propensity_2017]
<!-- Mayer-Schonberger V. and Cukier K. 2013. Big Data: A Revolution That -->
<!-- Will Transform How We Live, Work and Think. -->
<!-- John Murray publica- -->
<!-- tions. -->
<!-- - -->
<!-- Townsend, A.M. Smart Cities - Big Data, Civic Hackers, and the Quest -->
<!-- for a New Utopia. Norton 2014. -->
<!-- - -->
<!-- Lohr, S. 2015. Dataism. One World Publications. -->
<!-- Please also have a look at my own paper which is discussed in the rst session: -->
<!-- 3- -->
<!-- Charles Fox, Peter Billington, Dominic Paulo, and Clive Cooper. -->
<!-- Ori- -->
<!-- gin destination analysis on the London orbital automated number plate -->
<!-- recognition network. In European Transport Conference, 2010. -->
<!-- A blog full of beautiful examples of data science perhaps you will contribute -->
<!-- to it one day? -->
<!-- - -->
<!-- www.informationisbeautiful.net -->
<!-- For fun - but making serious points about what might be possible with data: -->
<!-- - -->
<!-- Movie: Minority Report. 20th Century Fox, 2002. (Or read the Philip K -->
<!-- Dick short story if you prefer) -->
<!-- - -->
<!-- Video game: Sim City. Maxis, 2003. (or free clones such as http://www.opencity.info/) -->
<!-- If you like money, try searching this jobs website for data scientist salaries: -->
## Specific/online resources
- It's worth thinking about what you want to do next so I recommend taking a look for 'data science' and transport jobs on sites such as www.cwjobs.co.uk
- For a refresher on maths it may be useful to have a maths text book on hand. This should cover mathematical concepts including vectors, matrices, eigenvectors, numerical parameter optimization, calculus, dierential equations, Gaussian distributions, Bayes' rule, covariance matrices.
## Bibliography