-
-
@@ -634,101 +634,106 @@
+Using R and Python to model future hospital activity: EARL Conference 2024
+YiWen Hon, Matt Dray, Tom Jemmett
+2024-09-05
+
+
Agile and scrum working
Chris Beeley
2024-08-22
-
+
Open source licensing: Or: how I learned to stop worrying and love openness
Chris Beeley
2024-05-30
-
+
GitHub as a team sport: DfT QA Month
Matt Dray
2024-05-23
-
+
Store Data Safely: Coffee & Coding
YiWen Hon, Matt Dray
2024-05-16
-
+
Coffee and Coding: Making my analytical workflow more reproducible with {targets}
Jacqueline Grout
2024-01-25
-
+
Conference Check-in App: NHS-R/NHS.pycom 2023
Tom Jemmett
2023-10-17
-
+
System Dynamics in health and care: fitting square data into round models
Sally Thompson
2023-10-09
-
+
Repeating Yourself with Functions: Coffee and Coding
Sally Thompson
2023-09-07
-
+
Coffee and Coding: Working with Geospatial Data in R
Tom Jemmett
2023-08-24
-
+
Unit testing in R: NHS-R Community Webinar
Tom Jemmett
2023-08-23
-
+
Everything you ever wanted to know about data science: but were too afraid to ask
Chris Beeley
2023-08-02
-
+
Travels with R and Python: the power of data science in healthcare
Chris Beeley
2023-08-02
-
+
An Introduction to the New Hospital Programme Demand Model: HACA 2023
Tom Jemmett
2023-07-11
-
+
What good data science looks like
Chris Beeley
2023-05-23
-
+
Text mining of patient experience data
Chris Beeley
2023-05-15
-
+
Coffee and Coding: {targets}
Tom Jemmett
2023-03-23
-
+
Collaborative working
Chris Beeley
2023-03-23
-
+
Coffee and Coding: Good Coding Practices
Tom Jemmett
2023-03-09
-
+
RAP: what is it and how can my team start using it effectively?
Chris Beeley
2023-03-09
-
+
Coffee and coding: Intro session
Chris Beeley
2023-02-23
diff --git a/presentations/su_presentation.scss b/presentations/su_presentation.scss
index 59c5910..5545823 100644
--- a/presentations/su_presentation.scss
+++ b/presentations/su_presentation.scss
@@ -1,211 +1,215 @@
-/*-- scss:defaults --*/
-$su-charcoal: #2c2825;
-$su-charcoal-l20: lighten($su-charcoal, 20%);
-$su-charcoal-l40: lighten($su-charcoal, 40%);
-$su-charcoal-l60: lighten($su-charcoal, 60%);
-$su-charcoal-l80: lighten($su-charcoal, 80%);
-
-$su-yellow: #f9bd07;
-$su-yellow-l20: lighten($su-yellow, 20%);
-$su-yellow-l40: lighten($su-yellow, 40%);
-$su-yellow-l60: lighten($su-yellow, 60%);
-$su-yellow-l80: lighten($su-yellow, 80%);
-
-$su-blue: #5881c1;
-$su-white: #f5f4f3;
-$su-red: #ec6555;
-$su-slate: #686f73;
-
-// load in font awesome icons
-@import url("https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.2.1/css/all.min.css");
-
-
-$body-bg: $su-white;
-$body-color: $su-charcoal-l20;
-$highlight-color: $su-yellow;
-$link-color: $su-blue;
-$footer-bg: $su-charcoal;
-$footer-fg: $su-yellow;
-
-$presentation-heading-color: $su-charcoal;
-
-/*-- scss:rules --*/
-
-.reveal .slide blockquote {
- border-left: 3px solid $text-muted;
- padding-left: 0.6em;
- background: #F6F7F7;
-}
-
-.reveal h2 {
- padding-bottom: 0.3em;
-}
-
-.reveal .footer {
- color: $su-yellow;
- background-color: $su-charcoal;
- display: block;
- position: fixed;
- bottom: 0px !important;
- padding-bottom: 12px;
- padding-top: 12px;
- width: 100%;
- text-align: center;
- font-size: 18px;
- z-index: 2;
-}
-
-// need to figure out how to add in :not(.panel-tabset-tabby)
-.reveal .slide ul {
- li {
- list-style: none;
- /* Remove default bullets */
- }
-
- li::before {
- content: "\25FC";
- /* Add content: \2022 is the CSS Code/unicode for a bullet; add a space for easier alignment! */
- color: $highlight-color;
- /* Change the color */
- display: inline-block;
- /* Needed to add space between the bullet and the text */
- width: 1.5em;
- /* Also needed for space (tweak if needed) */
- margin-left: -1.5em;
- /* Also needed for space (tweak if needed) */
- font-size: 66%;
- }
-}
-
-.reveal .progress span {
- background-color: $su-yellow;
-}
-
-.reveal .slide-logo {
- z-index: 3;
- bottom: -5px !important;
-}
-
-.slide-background:first-child {
- background-color: $su-charcoal;
- background-image: radial-gradient($su-charcoal-l20 1%, transparent 5%);
- background-position: 0 0, 10px 10px;
- background-size: 30px 30px;
- background-repeat: repeat;
- height: 100%;
- width: 100%;
-}
-
-.slide-background:first-child .slide-background-content {
- background-image: url("https://the-strategy-unit.github.io/assets/logo_yellow.svg");
- top: 0.5em;
- right: 0.5em;
- height: 4em;
- width: 4em;
- position: absolute;
-}
-
-#title-slide {
- text-align: left;
-}
-
-#title-slide h1 {
- font-size: 2em;
- color: $su-yellow !important;
-}
-
-#title-slide .subtitle {
- color: $su-charcoal-l60;
-}
-
-#title-slide .quarto-title-authors {
- justify-content: left;
- display: block;
-}
-
-#title-slide .quarto-title-author {
- padding-top: 1em;
- color: $su-red;
- padding: 0;
-}
-
-#title-slide .quarto-title-author a {
- color: $su-blue;
-}
-
-#title-slide p.institute {
- font-size: 0.75em;
- color: $su-charcoal-l60;
-}
-
-#title-slide p.date {
- font-size: 0.5em;
- color: $su-charcoal-l40;
-}
-
-.reveal .slide-logo:first-child {
- display: none;
-}
-
-.inverse {
- .slide-background-content {
- background-color: $su-charcoal;
- }
-
- h1,
- h2,
- h3,
- h4 {
- color: $su-charcoal-l40 !important;
- }
-}
-
-.reveal .imitate-title {
- font-size: 2em;
-}
-
-.reveal .inverse .imitate-title {
- color: $su-yellow !important;
-}
-
-.text-bottom {
- bottom: 1em;
- position: absolute;
-}
-
-.no-bullets ul {
- list-style-type: none;
- /* Remove bullets */
- padding: 0;
- /* Remove padding */
- margin: 0;
- /* Remove margins */
-}
-
-.small-table table {
- font-size: 1.65rem;
-}
-
-.small {
- font-size: 1.2rem;
-}
-
-.yellow {
- color: $su-yellow;
-}
-
-.light-yellow {
- color: $su-yellow-l40;
-}
-
-.light-charcoal {
- color: $su-charcoal-l40;
-}
-
-.very-light-charcoal {
- color: $su-charcoal-l60;
-}
-
-.center {
- text-align: center;
-}
\ No newline at end of file
+/*-- scss:defaults --*/
+$su-charcoal: #2c2825;
+$su-charcoal-l20: lighten($su-charcoal, 20%);
+$su-charcoal-l40: lighten($su-charcoal, 40%);
+$su-charcoal-l60: lighten($su-charcoal, 60%);
+$su-charcoal-l80: lighten($su-charcoal, 80%);
+
+$su-yellow: #f9bd07;
+$su-yellow-l20: lighten($su-yellow, 20%);
+$su-yellow-l40: lighten($su-yellow, 40%);
+$su-yellow-l60: lighten($su-yellow, 60%);
+$su-yellow-l80: lighten($su-yellow, 80%);
+
+$su-blue: #5881c1;
+$su-white: #f5f4f3;
+$su-red: #ec6555;
+$su-slate: #686f73;
+
+// load in font awesome icons
+@import url("https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.2.1/css/all.min.css");
+
+
+$body-bg: $su-white;
+$body-color: $su-charcoal-l20;
+$highlight-color: $su-yellow;
+$link-color: $su-blue;
+$footer-bg: $su-charcoal;
+$footer-fg: $su-yellow;
+
+$presentation-heading-color: $su-charcoal;
+
+/*-- scss:rules --*/
+
+.reveal .slide blockquote {
+ border-left: 3px solid $text-muted;
+ padding-left: 0.6em;
+ background: #F6F7F7;
+}
+
+.reveal h2 {
+ padding-bottom: 0.3em;
+}
+
+.reveal .footer {
+ color: $su-yellow;
+ background-color: $su-charcoal;
+ display: block;
+ position: fixed;
+ bottom: 0px !important;
+ padding-bottom: 12px;
+ padding-top: 12px;
+ width: 100%;
+ text-align: center;
+ font-size: 18px;
+ z-index: 2;
+}
+
+// need to figure out how to add in :not(.panel-tabset-tabby)
+.reveal .slide ul {
+ li {
+ list-style: none;
+ /* Remove default bullets */
+ }
+
+ li::before {
+ content: "\25FC";
+ /* Add content: \2022 is the CSS Code/unicode for a bullet; add a space for easier alignment! */
+ color: $highlight-color;
+ /* Change the color */
+ display: inline-block;
+ /* Needed to add space between the bullet and the text */
+ width: 1.5em;
+ /* Also needed for space (tweak if needed) */
+ margin-left: -1.5em;
+ /* Also needed for space (tweak if needed) */
+ font-size: 66%;
+ }
+}
+
+.reveal .progress span {
+ background-color: $su-yellow;
+}
+
+.reveal .slide-logo {
+ z-index: 3;
+ bottom: -5px !important;
+}
+
+.slide-background:first-child {
+ background-color: $su-charcoal;
+ background-image: radial-gradient($su-charcoal-l20 1%, transparent 5%);
+ background-position: 0 0, 10px 10px;
+ background-size: 30px 30px;
+ background-repeat: repeat;
+ height: 100%;
+ width: 100%;
+}
+
+.slide-background:first-child .slide-background-content {
+ background-image: url("https://the-strategy-unit.github.io/assets/logo_yellow.svg");
+ top: 0.5em;
+ right: 0.5em;
+ height: 4em;
+ width: 4em;
+ position: absolute;
+}
+
+#title-slide {
+ text-align: left;
+}
+
+#title-slide h1 {
+ font-size: 2em;
+ color: $su-yellow !important;
+}
+
+#title-slide .subtitle {
+ color: $su-charcoal-l60;
+}
+
+#title-slide .quarto-title-authors {
+ justify-content: left;
+ display: block;
+}
+
+#title-slide .quarto-title-author {
+ padding-top: 1em;
+ color: $su-red;
+ padding: 0;
+}
+
+#title-slide .quarto-title-author a {
+ color: $su-blue;
+}
+
+#title-slide p.institute {
+ font-size: 0.75em;
+ color: $su-charcoal-l60;
+}
+
+#title-slide p.date {
+ font-size: 0.5em;
+ color: $su-charcoal-l40;
+}
+
+.reveal .slide-logo:first-child {
+ display: none;
+}
+
+.inverse {
+ .slide-background-content {
+ background-color: $su-charcoal;
+ }
+
+ h1,
+ h2,
+ h3,
+ h4 {
+ color: $su-charcoal-l40 !important;
+ }
+}
+
+.reveal .imitate-title {
+ font-size: 2em;
+}
+
+.reveal .inverse .imitate-title {
+ color: $su-yellow !important;
+}
+
+.reveal .inverse {
+ color: $su-white;
+}
+
+.text-bottom {
+ bottom: 1em;
+ position: absolute;
+}
+
+.no-bullets ul {
+ list-style-type: none;
+ /* Remove bullets */
+ padding: 0;
+ /* Remove padding */
+ margin: 0;
+ /* Remove margins */
+}
+
+.small-table table {
+ font-size: 1.65rem;
+}
+
+.small {
+ font-size: 1.2rem;
+}
+
+.yellow {
+ color: $su-yellow;
+}
+
+.light-yellow {
+ color: $su-yellow-l40;
+}
+
+.light-charcoal {
+ color: $su-charcoal-l40;
+}
+
+.very-light-charcoal {
+ color: $su-charcoal-l60;
+}
+
+.center {
+ text-align: center;
+}
diff --git a/search.json b/search.json
index 18b1a32..e664706 100644
--- a/search.json
+++ b/search.json
@@ -187,7 +187,7 @@
"href": "presentations/index.html",
"title": "Presentations",
"section": "",
- "text": "Title\nAuthor\nDate\n\n\n\n\nAgile and scrum working\nChris Beeley\n2024-08-22\n\n\nOpen source licensing: Or: how I learned to stop worrying and love openness\nChris Beeley\n2024-05-30\n\n\nGitHub as a team sport: DfT QA Month\nMatt Dray\n2024-05-23\n\n\nStore Data Safely: Coffee & Coding\nYiWen Hon, Matt Dray\n2024-05-16\n\n\nCoffee and Coding: Making my analytical workflow more reproducible with {targets}\nJacqueline Grout\n2024-01-25\n\n\nConference Check-in App: NHS-R/NHS.pycom 2023\nTom Jemmett\n2023-10-17\n\n\nSystem Dynamics in health and care: fitting square data into round models\nSally Thompson\n2023-10-09\n\n\nRepeating Yourself with Functions: Coffee and Coding\nSally Thompson\n2023-09-07\n\n\nCoffee and Coding: Working with Geospatial Data in R\nTom Jemmett\n2023-08-24\n\n\nUnit testing in R: NHS-R Community Webinar\nTom Jemmett\n2023-08-23\n\n\nEverything you ever wanted to know about data science: but were too afraid to ask\nChris Beeley\n2023-08-02\n\n\nTravels with R and Python: the power of data science in healthcare\nChris Beeley\n2023-08-02\n\n\nAn Introduction to the New Hospital Programme Demand Model: HACA 2023\nTom Jemmett\n2023-07-11\n\n\nWhat good data science looks like\nChris Beeley\n2023-05-23\n\n\nText mining of patient experience data\nChris Beeley\n2023-05-15\n\n\nCoffee and Coding: {targets}\nTom Jemmett\n2023-03-23\n\n\nCollaborative working\nChris Beeley\n2023-03-23\n\n\nCoffee and Coding: Good Coding Practices\nTom Jemmett\n2023-03-09\n\n\nRAP: what is it and how can my team start using it effectively?\nChris Beeley\n2023-03-09\n\n\nCoffee and coding: Intro session\nChris Beeley\n2023-02-23"
+ "text": "Title\nAuthor\nDate\n\n\n\n\nUsing R and Python to model future hospital activity: EARL Conference 2024\nYiWen Hon, Matt Dray, Tom Jemmett\n2024-09-05\n\n\nAgile and scrum working\nChris Beeley\n2024-08-22\n\n\nOpen source licensing: Or: how I learned to stop worrying and love openness\nChris Beeley\n2024-05-30\n\n\nGitHub as a team sport: DfT QA Month\nMatt Dray\n2024-05-23\n\n\nStore Data Safely: Coffee & Coding\nYiWen Hon, Matt Dray\n2024-05-16\n\n\nCoffee and Coding: Making my analytical workflow more reproducible with {targets}\nJacqueline Grout\n2024-01-25\n\n\nConference Check-in App: NHS-R/NHS.pycom 2023\nTom Jemmett\n2023-10-17\n\n\nSystem Dynamics in health and care: fitting square data into round models\nSally Thompson\n2023-10-09\n\n\nRepeating Yourself with Functions: Coffee and Coding\nSally Thompson\n2023-09-07\n\n\nCoffee and Coding: Working with Geospatial Data in R\nTom Jemmett\n2023-08-24\n\n\nUnit testing in R: NHS-R Community Webinar\nTom Jemmett\n2023-08-23\n\n\nEverything you ever wanted to know about data science: but were too afraid to ask\nChris Beeley\n2023-08-02\n\n\nTravels with R and Python: the power of data science in healthcare\nChris Beeley\n2023-08-02\n\n\nAn Introduction to the New Hospital Programme Demand Model: HACA 2023\nTom Jemmett\n2023-07-11\n\n\nWhat good data science looks like\nChris Beeley\n2023-05-23\n\n\nText mining of patient experience data\nChris Beeley\n2023-05-15\n\n\nCoffee and Coding: {targets}\nTom Jemmett\n2023-03-23\n\n\nCollaborative working\nChris Beeley\n2023-03-23\n\n\nCoffee and Coding: Good Coding Practices\nTom Jemmett\n2023-03-09\n\n\nRAP: what is it and how can my team start using it effectively?\nChris Beeley\n2023-03-09\n\n\nCoffee and coding: Intro session\nChris Beeley\n2023-02-23"
},
{
"objectID": "presentations/2023-03-23_collaborative-working/index.html#introduction",
@@ -253,816 +253,746 @@
"text": "Data and .gitignore\n\nYour repo needs to be reproducible but also needs to be safe\nThe main branch should be reproducible by anyone at any time\n\nDocument package dependencies (using renv)\nDocument data loads if the data isn’t in the repo\n\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#section",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#section",
- "title": "Conference Check-in App",
+ "objectID": "presentations/2024-09-05_earl-nhp/index.html#the-new-hospital-programme-nhp",
+ "href": "presentations/2024-09-05_earl-nhp/index.html#the-new-hospital-programme-nhp",
+ "title": "Using R and Python to model future hospital activity",
+ "section": "The New Hospital Programme (NHP)",
+ "text": "The New Hospital Programme (NHP)\n\n\n\nA manifesto commitment\nFuture activity must be modelled\nNeed consistency across schemes\n\n\n\n\n\n\nBuilding new hospitals - replacing crumbling infrastructure in some cases, completely new builds in others.\nIt’s important to size the hospitals according to the type and quantity of activity there will be in the future.\nThere are many proprietary black box models in use for estimating healthcare activity in the future - no consistency, difficult to compare results\nStrategy Unit was asked to develop a model to be used across all of the builds: a model owned and operated by the NHS, for the NHS.\n\n\n::::"
+ },
+ {
+ "objectID": "presentations/2024-09-05_earl-nhp/index.html#model-process",
+ "href": "presentations/2024-09-05_earl-nhp/index.html#model-process",
+ "title": "Using R and Python to model future hospital activity",
"section": "",
- "text": "digital.library.unt.edu/ark:/67531/metadc1039451/m1/1/\n\n\nClark, Junebug. [Registration Desk for the LPC Conference], photograph, 2016-03-17/2016-03-19; (https://digital.library.unt.edu/ark:/67531/metadc1039451/m1/1/: accessed October 16, 2023), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Special Collections."
+ "text": "A probabilistic Monte Carlo simulation that:\n\nTakes hospital activity from a baseline year, using NHS England’s Hospital Episode Statistics (HES) data\nApplies variables that:\n\nare outside of our control (e.g. population changes, using ONS projections)\ncan reduce hospital activity (mitigators, e.g. virtual wards or teleappointments)\n\nForecasts future demand based on these variables, outputting probabilistic predictive intervals"
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#qr-codes-are-great",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#qr-codes-are-great",
- "title": "Conference Check-in App",
- "section": "QR codes are great",
- "text": "QR codes are great"
+ "objectID": "presentations/2024-09-05_earl-nhp/index.html#our-challenges",
+ "href": "presentations/2024-09-05_earl-nhp/index.html#our-challenges",
+ "title": "Using R and Python to model future hospital activity",
+ "section": "Our challenges",
+ "text": "Our challenges\n\n28 hospitals currently using the model\nModel is being developed whilst in production\nModel is very complex - technically, and for end users\n\n\n\nHospitals are actively using the model while it is still in development, which can be tricky\nDataset is massive for each hospital - hundreds and thousands of rows - all activity for a hospital trust in one year\nModel can accommodate hundreds of different variables, understanding and setting these can be challenging for end users\nWe have comprehensive, openly available documentation and also a team of Model Relationship Managers to help address this"
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#and-can-be-easily-generated-in-r",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#and-can-be-easily-generated-in-r",
- "title": "Conference Check-in App",
- "section": "and can be easily generated in R",
- "text": "and can be easily generated in R\ninstall.packages(\"qrcode\")\nlibrary(qrcode)\n\nqr_code(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")"
+ "objectID": "presentations/2024-09-05_earl-nhp/index.html#tools-and-platforms",
+ "href": "presentations/2024-09-05_earl-nhp/index.html#tools-and-platforms",
+ "title": "Using R and Python to model future hospital activity",
+ "section": "Tools and platforms",
+ "text": "Tools and platforms\n\nData pipelines: {targets} , SQL \nModel: Python , Docker \nApps: {shiny} and {golem} , Posit Connect \nInfrastructure and storage: Azure \nDocumentation: Quarto \nVersion control and collaboration: Git , GitHub \n\n\n\nSo how did we solve the problem?\nHere’s a rundown of the tools and platforms that we use.\nThe data pipeline is orchestrated by {targets} for its recipe-like format and so we re-run only what needs re-running.\nThe model is built in Python and involves a lot of pandas DataFrame manipulations.\nWe use Azure for storage of model input data and JSON files of results.\nUsers input model paramters in one Shiny app and view results in another. This uses modules and {golem} for its package focus, as well as {bs4Dash}. We have development and productino environments.\nWe have a deployed Quarto website that contains the documentation for the whole project.\nIn general, we’re following the principles of Reproducible Analytical Pipelines (RAP) in everything we do.\nAll originally written by Tom.\nAs the team has grown we have shared responsibilities: YiWen in Python, Matt with Shiny, Tom as technical lead."
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#why-not",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#why-not",
- "title": "Conference Check-in App",
- "section": "Why not?",
- "text": "Why not?\n\n{shiny} would be doing all the processing on the server side\nwe would need to read from a camera client side\nthen stream video to the server for {shiny} to detect and decode the QR codes"
+ "objectID": "presentations/2024-09-05_earl-nhp/index.html#structure",
+ "href": "presentations/2024-09-05_earl-nhp/index.html#structure",
+ "title": "Using R and Python to model future hospital activity",
+ "section": "",
+ "text": "This is a simplified overview of the structure and flow of information through the system.\nThe full structure is quite complex, reflecting the complexity of user needs and the scale of the task.\nData from our database is processed and stored in Azure Storage Containers via a targets pipeline. Additional data, like ONS population projections, are also stored.\nThe users interact with a Shiny app to set their input parameters. The app provides some contextual information derived from the data held in Azure. Users click a button to run the model.\nThe model is deployed as a Docker container in Azure Continer Instances, triggered by an API call.\nThe model results are stored as JSON in an Azure container, ready for collection and presentation in an outputs app.\nUsers can view charts and tables and download files for further analysis.\nSo there’s clear front- and backends and we have\nFurther complexity is added by the need to process and present information despite changes to the model over time.\nWe use development and production environments for our apps to help reduce errors."
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work",
- "title": "Conference Check-in App",
- "section": "How does this work?",
- "text": "How does this work?\n\n\nFront-end\n\n\nuses the React JavaScript framework\n@yidel/react-qr-scanner\nApp scan’s a QR code, then sends this to our backend\nA window pops up to say who has checked in, or shows an error message"
+ "objectID": "presentations/2024-09-05_earl-nhp/index.html#outputs-app",
+ "href": "presentations/2024-09-05_earl-nhp/index.html#outputs-app",
+ "title": "Using R and Python to model future hospital activity",
+ "section": "",
+ "text": "Here’s a preview of the outputs app.\nIn the navbar you can see that users can aggregate by hospital sites; view charts and tables; and download results files for further processing.\nThere are also context-specific drodown menus to focus in on certain data. For example, to see results by activity type: inpatients, outpatients or A&E.\nIn this particular tab we can see a beeswarm plot showing each simulation as an individual point. This kind of presentation is important to remind users that the model outputs a distribution; that there are range of possibilities.\nThe data provided here to users is used to drive decisions about the size of hospital that will be developed."
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-1",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-1",
- "title": "Conference Check-in App",
- "section": "How does this work?",
- "text": "How does this work?\nBack-end\nUses the {plumber} R package to build the API, with endpoints for\n\ngetting the list of all of the attendees for that day\nuploading a list of attendees in bulk\nadding an attendee individually\ngetting an attendee\nchecking the attendee in"
+ "objectID": "presentations/2024-09-05_earl-nhp/index.html#next",
+ "href": "presentations/2024-09-05_earl-nhp/index.html#next",
+ "title": "Using R and Python to model future hospital activity",
+ "section": "Next",
+ "text": "Next\n\nForecast regionally and nationally\nMove data and pipelines into Databricks\nOpen-source model code\n\n\n\nWe’re currently working with hospitals and trusts, but we’re also expanding the geographical scale to produce results at the regional and national scale. This will require some thinking around processing, modelling and generating outputs.\nWe’re currently transferring data processing into Databricks, partly to bring all the steps into one platform but also as an opportunity to speed up the processing by using Spark.\nFinally, we already have some aspects in the open, like the project information site, but we’d also like to open-source the model code itself so that others can use and develop it."
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-2",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-2",
- "title": "Conference Check-in App",
- "section": "How does this work?",
- "text": "How does this work?\nMore Back-end Stuff\n\nuses a simple SQLite DB that will be thrown away at the end of the conference\nwe send personalised emails using {blastula} to the attendees with their QR codes\nthe QR codes are just random ids (UUIDs) that identify each attendee\nuses websockets to update all of the clients when a user checks in (to update the list of attendees)"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#targets-for-analysts",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#targets-for-analysts",
+ "title": "Coffee and Coding",
+ "section": "{targets} for analysts",
+ "text": "{targets} for analysts\n\n\n\nTom previously presented about {targets} at a coffee and coding last March and you can revisit his presentation and learn about the reasons why you should use the package to manage your pipeline and see a simple demonstration of how to use the package.\nMatt has presented previously about {targets} and making your workflows (pipelines) reproducible.\nSo….. if you aren’t really even sure why your pipeline needs managing as an analyst or whether you actually have one (you do) then links to their presentations are at the end"
},
{
- "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#learning-different-tools-can-show-you-the-light",
- "href": "presentations/2023-10-17_conference-check-in-app/index.html#learning-different-tools-can-show-you-the-light",
- "title": "Conference Check-in App",
- "section": "Learning different tools can show you the light",
- "text": "Learning different tools can show you the light\n\nunsplash.com/photos/tMGMINwFOtI"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#aims",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#aims",
+ "title": "Coffee and Coding",
+ "section": "Aims",
+ "text": "Aims\n\nIn this presentation we aim to demonstrate the real-world use of {targets} in an analysis project, but first a brief explanation\n\n\n\nWithout {targets} we\n\n\nWrite a script\nExecute script\nMake changes\nGo to step 2\n\n\n\nWith {targets} we will\n\n\nlearn how the various stages of our analysis fit together\nsave time by only running necessary stages as we cycle through the process\nhelp future you and colleagues re-visiting the analysis - Matt says “its like a time-capsule”\nmake Reproducible Analytical Pipelines\n\n\nsource: The {targets} R package user manual"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-data-science",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-data-science",
- "title": "Travels with R and Python",
- "section": "What is data science?",
- "text": "What is data science?\n\n“A data scientist knows more about computer science than the average statistician, and more about statistics than the average computer scientist”\n\n(Josh Wills, a former head of data engineering at Slack)"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#explain-the-live-project",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#explain-the-live-project",
+ "title": "Coffee and Coding",
+ "section": "Explain the live project",
+ "text": "Explain the live project\n\noriginal project had 30+ metrics\nmultiple inter-related processing steps\neach time a metric changed or a process was altered it impacted across the project\nthere was potential for mistakes, duplication, lots of wasted time\nusing targets provides a structure that handles these inter-relationships"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#drew-conways-famous-venn-diagram",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#drew-conways-famous-venn-diagram",
- "title": "Travels with R and Python",
- "section": "Drew Conway’s famous Venn diagram",
- "text": "Drew Conway’s famous Venn diagram\n\nSource"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#how-targets-can-help",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#how-targets-can-help",
+ "title": "Coffee and Coding",
+ "section": "How {targets} can help",
+ "text": "How {targets} can help\n\ngets you thinking about your analysis and its building blocks\ntargets forces you into a functions approach to workflow\nentire pipeline is reproducible\nvisualise on one page\nsaves time\n(maybe we need an advanced function writing session in another C&C?)"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science",
- "title": "Travels with R and Python",
- "section": "What are the skills of data science?",
- "text": "What are the skills of data science?\n\nAnalysis\n\nML\nStats\nData viz\n\nSoftware engineering\n\nProgramming\nSQL/ data\nDevOps\nRAP"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#demonstration-in-a-live-project",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#demonstration-in-a-live-project",
+ "title": "Coffee and Coding",
+ "section": "Demonstration in a live project",
+ "text": "Demonstration in a live project\nLet’s look at a real life example in a live project…"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science-1",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science-1",
- "title": "Travels with R and Python",
- "section": "What are the skills of data science?",
- "text": "What are the skills of data science?\n\nDomain knowledge\n\nCommunication\nProblem formulation\nDashboards and reports"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#visualising",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#visualising",
+ "title": "Coffee and Coding",
+ "section": "Visualising",
+ "text": "Visualising\nCurrent project in {targets} and visualised with tar_visnetwork()"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#stats-and-data-viz",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#stats-and-data-viz",
- "title": "Travels with R and Python",
- "section": "Stats and data viz",
- "text": "Stats and data viz\n\nML leans a bit more towards atheoretical prediction\nStats leans a bit more towards inference (but they both do both)\nData scientists may use different visualisations\n\nInteractive web based tools\nDashboard based visualisers e.g. {stminsights}"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#code",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#code",
+ "title": "Coffee and Coding",
+ "section": "Code",
+ "text": "Code\n\nit’s like a recipe of steps\nit’s easier to read\nyou have built functions which you can transfer and reuse\nit’s efficient, good practice\ndebugging is easier because if/when it fails you know exactly which target it has failed on\nit creates intermediate cached objects you can fetch at any time"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#software-engineering",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#software-engineering",
- "title": "Travels with R and Python",
- "section": "Software engineering",
- "text": "Software engineering\n\nProgramming\n\nNo/ low code data science?\n\nSQL/ data\n\nTend to use reproducible automated processes\n\nDevOps\n\nPlan, code, build, test, release, deploy, operate, monitor\n\nRAP\n\nI will come back to this"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#how-can-i-start-using-it",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#how-can-i-start-using-it",
+ "title": "Coffee and Coding",
+ "section": "How can I start using it?",
+ "text": "How can I start using it?\n\nYou could “retro-fit” it to your project, but … ideally you should start your project off using {targets}\nThere are at least three of us in SU who have used it in our projects.\nWe are offering to hand hold you to get started with your next project.\nMatt, Tom, Jacqueline"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#domain-knowledge",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#domain-knowledge",
- "title": "Travels with R and Python",
- "section": "Domain knowledge",
- "text": "Domain knowledge\n\nDo stuff that matters\n\nThe best minds of my generation are thinking about how to make people click ads. That sucks. Jeffrey Hammerbacher\n\nConvince other people that it matters\nThis is the hardest part of data science"
+ "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#useful-targets-links",
+ "href": "presentations/2024-01-25_coffee-and-coding/index.html#useful-targets-links",
+ "title": "Coffee and Coding",
+ "section": "Useful {targets} links",
+ "text": "Useful {targets} links\n\nTom’s previous coffee and coding presentation\nMatt’s previous presentations\nThe {targets} documentation is detailed and easy to follow.\nA demo repository demonstrated in last weeks NHSE C&C\nSoftware Carpentry are developing a course here Pre-alpha targets course\nLive project demonstrated in this presentation using {targets}\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#rap",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#rap",
- "title": "Travels with R and Python",
- "section": "RAP",
- "text": "RAP\n\nData science isn’t RAP\nRAP isn’t data science\nThey are firm friends"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#how-did-we-get-here",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#how-did-we-get-here",
+ "title": "Agile and scrum working",
+ "section": "How did we get here?",
+ "text": "How did we get here?\n\nWaterfall approaches were used in the early days of software development\n\nRequirements; Design; Development; Integration; Testing; Deployment\n\nYou only move to the next stage when the first one is complete\n(although actually it turns out you kind of don’t…)"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#reproducibility",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#reproducibility",
- "title": "Travels with R and Python",
- "section": "Reproducibility",
- "text": "Reproducibility\n\nReproducibility in science\nThe $6B spreadsheet error\nGeorge Osbourne’s austerity was based on a spreadsheet error\nFor us, reproducibility also means we can do the same analysis 50 times in one minute\n\nWhich is why I started down the road of data science"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-road-to-agile",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-road-to-agile",
+ "title": "Agile and scrum working",
+ "section": "The road to agile",
+ "text": "The road to agile\n\nSome of the ideas for agile floated around in the 20th century\nShewart’s Plan-Do-Study-Act cycle\nThe New New Product Development Game in 1986\nScrum (which we’ll return to) was proposed in 1993\nIn 2001 the Manifesto for Agile Software Development was published"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-rap",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-rap",
- "title": "Travels with R and Python",
- "section": "What is RAP",
- "text": "What is RAP\n\na process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\nRAP should be:\n\n\nthe core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\n\nGoldacre review"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-manifesto",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-manifesto",
+ "title": "Agile and scrum working",
+ "section": "The agile manifesto",
+ "text": "The agile manifesto\n\nCopyright © 2001 Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick\nRobert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas\nthis declaration may be freely copied in any form, but only in its entirety through this notice."
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--baseline",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--baseline",
- "title": "Travels with R and Python",
- "section": "Levels of RAP- Baseline",
- "text": "Levels of RAP- Baseline\n\nData produced by code in an open-source language (e.g., Python, R, SQL)\nCode is version controlled\nRepository includes a README.md file that clearly details steps a user must follow to reproduce the code\nCode has been peer reviewed\nCode is published in the open and linked to & from accompanying publication (if relevant)\n\n\nSource: NHS Digital RAP community of practice"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--software-and-the-mvp",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--software-and-the-mvp",
+ "title": "Agile and scrum working",
+ "section": "Agile principles- software and the MVP",
+ "text": "Agile principles- software and the MVP\n\nOur highest priority is to satisfy the customer through early and continuous delivery of valuable software.\nDeliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.\nWorking software is the primary measure of progress.\n\n(these principles and those on following slides copyright Ibid.)"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--silver",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--silver",
- "title": "Travels with R and Python",
- "section": "Levels of RAP- Silver",
- "text": "Levels of RAP- Silver\n\nCode is well-documented…\nCode is well-organised following standard directory format\nReusable functions and/or classes are used where appropriate\nPipeline includes a testing framework\nRepository includes dependency information (e.g. requirements.txt, PipFile, environment.yml)\nData is handled and output in a Tidy data format\n\n\nSource: NHS Digital RAP community of practice"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--working-with-customers",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--working-with-customers",
+ "title": "Agile and scrum working",
+ "section": "Agile principles- working with customers",
+ "text": "Agile principles- working with customers\n\nWelcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.\nBusiness people and developers must work together daily throughout the project."
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--gold",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--gold",
- "title": "Travels with R and Python",
- "section": "Levels of RAP- Gold",
- "text": "Levels of RAP- Gold\n\nCode is fully packaged\nRepository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\nProcess runs based on event-based triggers (e.g., new data in database) or on a schedule\nChanges to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\n\nSource: NHS Digital RAP community of practice"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--teamwork",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--teamwork",
+ "title": "Agile and scrum working",
+ "section": "Agile principles- teamwork",
+ "text": "Agile principles- teamwork\n\nBuild projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.\nThe most efficient and effective method of conveying information to and within a development team is face-to-face conversation.\nThe best architectures, requirements, and designs emerge from self-organizing teams.\nAt regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly."
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#data-science-in-healthcare",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#data-science-in-healthcare",
- "title": "Travels with R and Python",
- "section": "Data science in healthcare",
- "text": "Data science in healthcare\n\nForecasting\n\nStats versus ML\n\nText mining\n\nR versus Python\n\nDemand modelling\n\nDevOps as a way of life"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--project-management",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--project-management",
+ "title": "Agile and scrum working",
+ "section": "Agile principles- project management",
+ "text": "Agile principles- project management\n\nAgile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.\nContinuous attention to technical excellence and good design enhances agility.\nSimplicity–the art of maximizing the amount of work not done–is essential."
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#get-involved",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#get-involved",
- "title": "Travels with R and Python",
- "section": "Get involved!",
- "text": "Get involved!\n\nNHS-R community\n\nWebinars, training, conference, Slack\n\nNHS Pycom\n\nditto…\n\nMLCSU GitHub?\nBuild links with the other CSUs"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-advantage",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-advantage",
+ "title": "Agile and scrum working",
+ "section": "The agile advantage",
+ "text": "The agile advantage\n\nBetter use of fixed resources to deliver an unknown outcome, rather than unknown resources to deliver a fixed outcome\nContinuous delivery"
},
{
- "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#contact",
- "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#contact",
- "title": "Travels with R and Python",
- "section": "Contact",
- "text": "Contact\n\n\n\n\n strategy.unit@nhs.net\n The-Strategy-Unit\n\n\n\n\n\n chris.beeley1@nhs.net\n chrisbeeley\n\n\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#feature-creep",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#feature-creep",
+ "title": "Agile and scrum working",
+ "section": "Feature creep",
+ "text": "Feature creep\n\nUsers ask for: everything they need, everything they think they may need, everything they want, everything they think they may want\n\n“every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can”\n\nZawinski’s Law- Source"
},
{
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman",
- "title": "Open source licensing",
- "section": "A note on Richard Stallman",
- "text": "A note on Richard Stallman\n\nRichard Stallman has been heavily criticised for some of this views\nHe is hard to ignore when talking about open source so I am going to talk about him\nNothing in this talk should be read as endorsing any of his comments outside (or inside) the world of open source"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#regular-stakeholder-feedback",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#regular-stakeholder-feedback",
+ "title": "Agile and scrum working",
+ "section": "Regular stakeholder feedback",
+ "text": "Regular stakeholder feedback\n\nAgile teams are very responsive to product feedback\nThe project we’re curently working on is very agile whether we like it or not\nOur customers never know what they want until we show them something they don’t want"
},
{
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source",
- "title": "Open source licensing",
- "section": "The origin of open source",
- "text": "The origin of open source\n\nIn the 50s and 60s source code was routinely shared with hardware and users were often expected to modify to run on their hardware\nBy the late 1960s the production cost of software was rising relative to hardware and proprietary licences became more prevalent\nIn 1980 Richard Stallman’s department at MIT took delivery of a printer they were not able to modify the source code for\nRichard Stallman launched the GNU project in 1983 to fight for software freedoms\nMIT licence was launched in the late 1980s\nCathedral and the bazaar was released in 1997 (more on which later)"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#more-agile-advantages",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#more-agile-advantages",
+ "title": "Agile and scrum working",
+ "section": "More agile advantages",
+ "text": "More agile advantages\n\nEarly and cheap failure\nContinuous testing and QA\nReduction in unproductive work\nTeam can improve regularly, not just the product"
},
{
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source",
- "title": "Open source licensing",
- "section": "What is open source?",
- "text": "What is open source?\n\nThink free as in free speech, not free beer (Stallman)\n\n\nOpen source does not mean free of charge! Software freedom implies the ability to sell code\nFree of charge does not mean open source! Many free to download pieces of software are not open source (Zoom, for example)\n\n\nBy Chao-Kuei et al. - https://www.gnu.org/philosophy/categories.en.html, GPL, Link"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-methods",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-methods",
+ "title": "Agile and scrum working",
+ "section": "Agile methods",
+ "text": "Agile methods\n\nThere are lots of agile methodologies\nI’m not going to embarrass myself by pretending to understand them\nExamples include Lean, Crystal, and Extreme Programming"
},
{
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms",
- "title": "Open source licensing",
- "section": "The four freedoms",
- "text": "The four freedoms\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 1: The freedom to study how the program works, and change it to make it do what you wish.\nFreedom 2: The freedom to redistribute and make copies so you can help your neighbor.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits."
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum",
+ "title": "Agile and scrum working",
+ "section": "Scrum",
+ "text": "Scrum\n\nScrum is the agile methodology we have adopted\nDespite dire warnings to the contrary we have not adopted it wholesale but most of its principles\nThe fundamental organising principle of work in scrum is a sprint lasting 1-4 weeks\nEach sprint finishes with a defined and useful piece of software that can be shown to/ used by customers"
},
{
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar",
- "title": "Open source licensing",
- "section": "Cathedral and the bazaar",
- "text": "Cathedral and the bazaar\n\nEvery good work of software starts by scratching a developer’s personal itch.\nGood programmers know what to write. Great ones know what to rewrite (and reuse).\nPlan to throw one [version] away; you will, anyhow (copied from Frederick Brooks’s The Mythical Man-Month).\nIf you have the right attitude, interesting problems will find you.\nWhen you lose interest in a program, your last duty to it is to hand it off to a competent successor.\nTreating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.\nRelease early. Release often. And listen to your customers.\nGiven a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.\nSmart data structures and dumb code works a lot better than the other way around.\nIf you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource."
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.",
- "title": "Open source licensing",
- "section": "Cathedral and the bazaar (cont.)",
- "text": "Cathedral and the bazaar (cont.)\n\nThe next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.\nOften, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.\nPerfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away. (Attributed to Antoine de Saint-Exupéry)\nAny tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.\nWhen writing gateway software of any kind, take pains to disturb the data stream as little as possible—and never throw away information unless the recipient forces you to!\nWhen your language is nowhere near Turing-complete, syntactic sugar can be your friend.\nA security system is only as secure as its secret. Beware of pseudo-secrets.\nTo solve an interesting problem, start by finding a problem that is interesting to you.\nProvided the development coordinator has a communications medium at least as good as the Internet, and knows how to lead without coercion, many heads are inevitably better than one."
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science",
- "title": "Open source licensing",
- "section": "The disciplines of open source are the disciplines of good data science",
- "text": "The disciplines of open source are the disciplines of good data science\n\nMeaningful README\nMeaningful commit messages\nModularity\nSeparating data code from analytic code from interactive code\nAssigning issues and pull requests for action/ review\nDon’t forget one of the most lazy and incompetent developers you will ever work with is yourself, six months later"
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist",
- "title": "Open source licensing",
- "section": "What licences exist?",
- "text": "What licences exist?\n\nPermissive\n\nSuch as MIT but there are others. Recommended by NHSX draft guidelines on open source\nApache is a notable permissive licence- includes a patent licence\nIn our work the OGL is also relevant- civil servant publish stuff under OGL (and MIT- it isn’t particularly recommended for code)\n\nCopyleft\n\nGPL2, GPL3, AGPL (“the GPL of the web”)\nNote that the provisions of the GPL only apply when you distribute the code\nAt a certain point it all gets too complicated and you need a lawyer\nMPL is a notable copyleft licence- can combine with proprietary code as long as kept separate\n\nArguments for permissive/ copyleft- getting your code used versus preserving software freedoms for other people\nNote that most of the licences are impossible to read! There is a website to explain tl;dr"
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter",
- "title": "Open source licensing",
- "section": "What is copyright and why does it matter",
- "text": "What is copyright and why does it matter\n\nCopyright is assigned at the moment of creation\nIf you made it in your own time, it’s yours (usually!)\nIf you made it at work, it belongs to your employer\nIf someone paid you to make it (“work for hire”) it belongs to them\nCrucially, the copyright holder can relicence software\n\nIf it’s jointly authored it depends if it’s a “collective” or “joint” work\nHonestly it’s pretty complicated. Just vest copyright in an organisation or group of individuals you trust\nGoldacre review suggests using Crown copyright for copyright in the NHS because it’s a “shoal, not a big fish” (with apologies to Ben whom I am misquoting)"
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel",
- "title": "Open source licensing",
- "section": "Iceweasel",
- "text": "Iceweasel\n\nIceweasel is a story of trademark rather than copyright\nDebian (a Linux flavour) had the permission to use the source code of Firefox, but not the logo\nSo they took the source code and made their own version\nThis sounds very obscure and unimportant but it could become important in future projects of ours, like…"
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects",
- "title": "Open source licensing",
- "section": "What we have learned in recent projects",
- "text": "What we have learned in recent projects\n\nThe huge benefits of being open\n\nTransparency\nWorking with customers\nGoodwill\n\nNonfree mitigators\nDifferent licences for different repos"
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like",
- "title": "Open source licensing",
- "section": "Software freedom means allowing people to do stuff you don’t like",
- "text": "Software freedom means allowing people to do stuff you don’t like\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.\nThe code isn’t the only thing with worth in the project\nThis is why there are whole businesses founded on “here’s the Linux source code”\nSo when we’re sharing code we are letting people do stupid things with it but we’re not recommending that they do stupid things with it\nPeople do stupid things with Excel and Microsoft don’t accept liability for that, and neither should we\nThis issue of sharing analytic code and merchantability for a particular purpose is poorly understood and I think everyone needs to be clearer on it (us, and our customers)\nIn my view a world where consultants are selling our code is better than a world where they’re selling their spreadsheets"
- },
- {
- "objectID": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano",
- "href": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano",
- "title": "Open source licensing",
- "section": "“Open source as in piano”",
- "text": "“Open source as in piano”\n\nThe patient experience QDC project\nOur current project\nOpen source code is not necessarily to be run, but understood and learned from\nBuilding a group of people who can use and contribute to your code is arguably as important as writing it\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
- },
- {
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap",
- "title": "RAP",
- "section": "What is RAP",
- "text": "What is RAP\n\na process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\nRAP should be:\n\n\nthe core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\nGoldacre review"
- },
- {
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve",
- "title": "RAP",
- "section": "What are we trying to achieve?",
- "text": "What are we trying to achieve?\n\nLegibility\nReproducibility\nAccuracy\nLaziness"
- },
- {
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles",
- "title": "RAP",
- "section": "What are some of the fundamental principles?",
- "text": "What are some of the fundamental principles?\n\nPredictability, reducing mental load, and reducing truck factor\nMaking it easy to collaborate with yourself and others on different computers, in the cloud, in six months’ time…\nDRY"
- },
- {
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap",
- "title": "RAP",
- "section": "The road to RAP",
- "text": "The road to RAP\n\nWe’re roughly using NHS Digital’s RAP stages\nThere is an incredibly large amount to learn!\nConfession time! (everything I do not know…)\nYou don’t need to do it all at once\nYou don’t need to do it all at all ever\nEach thing you learn will incrementally help you\nRemember- that’s why we learnt this stuff. Because it helped us. And it can help you too"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#product-owner",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#product-owner",
+ "title": "Agile and scrum working",
+ "section": "Product owner",
+ "text": "Product owner\n\nThis person is responsible for the backlog- what goes in to the sprint\nThe backlog should be inclusive of all of the things that customers want or might want\nThe backlog should be prioritised\nThe product owner does this through deep and frequent conversations with customers"
},
{
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline",
- "title": "RAP",
- "section": "Levels of RAP- Baseline",
- "text": "Levels of RAP- Baseline\n\nData produced by code in an open-source language (e.g., Python, R, SQL).\nCode is version controlled (see Git basics and using Git collaboratively guides).\nRepository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code\nCode has been peer reviewed.\nCode is published in the open and linked to & from accompanying publication (if relevant).\n\nSource: NHS Digital RAP community of practice"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-master-helps-the-scrum-team",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-master-helps-the-scrum-team",
+ "title": "Agile and scrum working",
+ "section": "Scrum master helps the scrum team",
+ "text": "Scrum master helps the scrum team\n\n“By coaching the team members in self-management and cross-functionality\nFocus on creating high-value Increments that meet the Definition of Done\nInfluence the removal of impediments to the Scrum Team’s progress\nEnsure that all Scrum events take place and are positive, productive, and kept within the timebox.”\n\nSource"
},
{
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver",
- "title": "RAP",
- "section": "Levels of RAP- Silver",
- "text": "Levels of RAP- Silver\n\nCode is well-documented…\nCode is well-organised following standard directory format\nReusable functions and/or classes are used where appropriate\nPipeline includes a testing framework\nRepository includes dependency information (e.g. requirements.txt, PipFile, environment.yml\nData is handled and output in a Tidy data format\n\nSource: NHS Digital RAP community of practice"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-backlog",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-backlog",
+ "title": "Agile and scrum working",
+ "section": "The backlog",
+ "text": "The backlog\n\nHaving an accurate and well prioritised backlog is key\nDon’t estimate the backlog in hours- use “T shirt sizes” or “points”\nPeople are terrible at estimating how long things take- particularly in software\nEverything in the backlog needs a defined “Done” state"
},
{
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold",
- "title": "RAP",
- "section": "Levels of RAP- Gold",
- "text": "Levels of RAP- Gold\n\nCode is fully packaged\nRepository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\nProcess runs based on event-based triggers (e.g., new data in database) or on a schedule\nChanges to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\nSource: NHS Digital RAP community of practice"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-planning",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-planning",
+ "title": "Agile and scrum working",
+ "section": "Sprint planning",
+ "text": "Sprint planning\n\nThe team, the product owner, and the scrum master plan the sprint\nSprints should be a fixed length of time less than one month\nThe sprint cannot be changed or added to (we break this rule)\nThe team works autonomously in the sprint- nobody decides who does what except the team\nCan take three hours and should if it needs to"
},
{
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there",
- "title": "RAP",
- "section": "A learning journey to get you there",
- "text": "A learning journey to get you there\n\nCode style, organising your files\nFunctions and iteration\nGit and GitHub\nPackaging your code\nTesting\nPackage management and versioning"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#standup",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#standup",
+ "title": "Agile and scrum working",
+ "section": "Standup",
+ "text": "Standup\n\nEvery day, for no more than 15 minutes (teams often stand up to reinforce this rule) team and scrum master meet\nEach person answers three questions\n\nWhat did you do yesterday to help the team finish the sprint?\nWhat will you do today to help the team finish the sprint?\nIs there an obstacle blocking you or the team from achieveing the sprint goal"
},
{
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there",
- "title": "RAP",
- "section": "How we can help each other get there",
- "text": "How we can help each other get there\n\nWork as a team!\nCoffee and coding!\nAsk for help!\nDo pair coding!\nGet your code reviewed!\nJoin the NHS-R/ NHSPycom communities"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-retro",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-retro",
+ "title": "Agile and scrum working",
+ "section": "Sprint retro",
+ "text": "Sprint retro\n\nWhat went well, what could have gone better, and what to improve next time\nLooking at process, not blaming individuals\nRequires maturity and trust to bring up issues, and to respond to them in a constructive way\nShould agree at the end on one process improvement which goes in the next sprint\nWe’ve had some really, really good retros and I think it’s a really important process for a team"
},
{
- "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca",
- "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca",
- "title": "RAP",
- "section": "HACA",
- "text": "HACA\n\nThe first national analytics conference for health and care\nInsight to action!\nJuly 11th and 12th, University of Birmingham\nAccepting abstracts for short and long talks and posters\nAbstract deadline 27th March\nHelp is available (with abstract, poster, preparing presentation…)!\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#team-perspective",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#team-perspective",
+ "title": "Agile and scrum working",
+ "section": "Team perspective",
+ "text": "Team perspective\n\nProduct owner- that’s me\n\nFocus, clarity and transparency, team delivery, clear and appropriate responsibilities\n\nScrum master- YiWen\nTeam member- Matt\nTeam member- Rhian"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines",
- "title": "System Dynamics in health and care",
- "section": "Health Data in the Headlines",
- "text": "Health Data in the Headlines\n\n\n\n\nUsed to seeing headlines that give a snapshot figure but doesn’t say much about the system.\nNow starting to see headlines that recognise flow through the system rather than snapshot in time of just one part.\nCan get better understanding of the issues in a system if we can map it as stocks and flows, but our datasets not designed to give up this information very readily. This talk is how I have tried to meet that challenge."
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-values",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-values",
+ "title": "Agile and scrum working",
+ "section": "Scrum values",
+ "text": "Scrum values\n\nCourage\nFocus\nCommitment\nRespect\nOpenness"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens",
- "title": "System Dynamics in health and care",
- "section": "Through the System Dynamics lens",
- "text": "Through the System Dynamics lens\n\nStock-flow model\nDynamic behaviour, feedback loops\n\nIn a few seconds, what is SD?\nAn approach to understanding the behaviour of complex systems over time. A method of mapping a system as stocks, whose levels can only change due to flows in and flows out. Stocks could be people on a waiting list, on a ward, money, …\nFlows are the rate at which things change in a given time period e.g. admissions per day, referrals per month.\nBehaviour of the system is determined by how the components interact with each other, not what each component does. Mapping the structure of a system like this leads us to identify feedback loops, and consequences of an action - both intended and unintended.\nIn this capacity-constrained model we only need 3 parameters to run the model (exogenous). All the behaviour within the grey box is determined by the interactions of those components (indogenous).\nHow do we get a value/values for referrals per day?\n(currently use specialist software to build and run our models, aim is to get to a point where we can run in open source.)"
+ "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#using-agile-outside-of-software",
+ "href": "presentations/2024-08-22_agile-and-scrum/index.html#using-agile-outside-of-software",
+ "title": "Agile and scrum working",
+ "section": "Using agile outside of software",
+ "text": "Using agile outside of software\n\nData science is outside of software (IMHO)\n\nWe don’t have daily standups and some of our processes run longer than in software development\n\nYou can build cars with Agile\nMarketing and UX design\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows",
- "title": "System Dynamics in health and care",
- "section": "Determining flows",
- "text": "Determining flows\n\n\n\n\n‘admissions per day’ is needed to populate the model.\n‘discharged’ could be used to verify the model against known data\n\nHow many admissions per day (or week, month…)\n\n\n\n\n\n\n\n \n\n\nGoing to use very simple model shown to explain how to extract flow data for admissions. Will start with visual explainer before going into the code.\n1. generate list of key dates (in this case daily, could be weekly, monthly)\n2. take our patient-level ID with admission and discharge dates\n3. count of admissions on that day/week"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today",
+ "title": "Coffee and Coding",
+ "section": "Packages we are using today",
+ "text": "Packages we are using today\n\nlibrary(tidyverse)\n\nlibrary(sf)\n\nlibrary(tidygeocoder)\nlibrary(PostcodesioR)\n\nlibrary(osrm)\n\nlibrary(leaflet)"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy",
- "title": "System Dynamics in health and care",
- "section": "Determining occupancy",
- "text": "Determining occupancy\n\n\n\n\n‘on ward’ is used to verify the model against known data\n\nLogic statement testing if the key date is wholly between admission and discharge dates\nflag for a match \n\n\n\n\n\n\n \n\n\nMight also want to generate occupancy, to compare the model output with actual data to verify/validate.\n1. generate list of key dates\n2. take our patient-level ID with admission and discharge dates\n3. going to take each date in our list of keydates, and see if there is an admission before that date and discharge after 4. this creates a wide data frame, the same length as patient data.\n5. once run through all the dates in the list, sum each column\nPatient A admitted on 2nd, so only starts being classed as resident on 3rd."
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data",
+ "title": "Coffee and Coding",
+ "section": "Getting boundary data",
+ "text": "Getting boundary data\nWe can use the ONS’s Geoportal we can grab boundary data to generate maps\n\n\n\nicb_url <- paste0(\n \"https://services1.arcgis.com\",\n \"/ESMARspQHYMw9BZ9/arcgis\",\n \"/rest/services\",\n \"/Integrated_Care_Boards_April_2023_EN_BGC\",\n \"/FeatureServer/0/query\",\n \"?outFields=*&where=1%3D1&f=geojson\"\n)\nicb_boundaries <- read_sf(icb_url)\n\nicb_boundaries |>\n ggplot() +\n geom_sf() +\n theme_void()"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows",
- "title": "System Dynamics in health and care",
- "section": "in R - flows",
- "text": "in R - flows\nEasy to do with count, or group_by and summarise\n\n\n admit_d <- spell_dates |> \n group_by(date_admit) |>\n count(date_admit)\n\nhead(admit_d)\n\n\n# A tibble: 6 × 2\n# Groups: date_admit [6]\n date_admit n\n <date> <int>\n1 2022-01-01 28\n2 2022-01-02 24\n3 2022-01-03 21\n4 2022-01-04 27\n5 2022-01-05 32\n6 2022-01-06 27"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data",
+ "title": "Coffee and Coding",
+ "section": "What is the icb_boundaries data?",
+ "text": "What is the icb_boundaries data?\n\nicb_boundaries |>\n select(ICB23CD, ICB23NM)\n\nSimple feature collection with 42 features and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -6.418667 ymin: 49.86479 xmax: 1.763706 ymax: 55.81112\nGeodetic CRS: WGS 84\n# A tibble: 42 × 3\n ICB23CD ICB23NM geometry\n <chr> <chr> <MULTIPOLYGON [°]>\n 1 E54000008 NHS Cheshire and Merseyside Integrated C… (((-3.083264 53.2559, -3…\n 2 E54000010 NHS Staffordshire and Stoke-on-Trent Int… (((-1.950489 53.21188, -…\n 3 E54000011 NHS Shropshire, Telford and Wrekin Integ… (((-2.380794 52.99841, -…\n 4 E54000013 NHS Lincolnshire Integrated Care Board (((0.2687853 52.81584, 0…\n 5 E54000015 NHS Leicester, Leicestershire and Rutlan… (((-0.7875237 52.97762, …\n 6 E54000018 NHS Coventry and Warwickshire Integrated… (((-1.577608 52.67858, -…\n 7 E54000019 NHS Herefordshire and Worcestershire Int… (((-2.272042 52.43972, -…\n 8 E54000022 NHS Norfolk and Waveney Integrated Care … (((1.666741 52.31366, 1.…\n 9 E54000023 NHS Suffolk and North East Essex Integra… (((0.8997023 51.7732, 0.…\n10 E54000024 NHS Bedfordshire, Luton and Milton Keyne… (((-0.4577115 52.32009, …\n# ℹ 32 more rows"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy",
- "title": "System Dynamics in health and care",
- "section": "in R - occupancy",
- "text": "in R - occupancy\nGenerate list of key dates\n\n\n\ndate_start <- dmy(01012022) \ndate_end <- dmy(31012022)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"day\"))\n\nkeydates <- data.frame(\n date = c(seq(date_start, by = \"day\", length.out=run_len))) \n\n\n\n\n date\n1 2022-01-01\n2 2022-01-02\n3 2022-01-03\n4 2022-01-04\n5 2022-01-05\n6 2022-01-06\n\n\n\n\nStart by generating the list of keydates. In this example we’re running the model in days, and checking each day in 2022.\nNeed the run length for the next step, to know how many times to iterate over"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes",
+ "title": "Coffee and Coding",
+ "section": "Working with geospatial dataframes",
+ "text": "Working with geospatial dataframes\nWe can simply join sf data frames and “regular” data frames together\n\n\n\nicb_metrics <- icb_boundaries |>\n st_drop_geometry() |>\n select(ICB23CD) |>\n mutate(admissions = rpois(n(), 1000000))\n\nicb_boundaries |>\n inner_join(icb_metrics, by = \"ICB23CD\") |>\n ggplot() +\n geom_sf(aes(fill = admissions)) +\n scale_fill_viridis_c() +\n theme_void()"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1",
- "title": "System Dynamics in health and care",
- "section": "in R - occupancy",
- "text": "in R - occupancy\nIterate over each date - need to have been admitted before, and discharged after\n\noccupancy_flag <- function(df) {\n\n # pre-allocate tibble size to speed up iteration in loop\n activity_all <- tibble(nrow = nrow(df)) |> \n select()\n \n for (i in 1:run_len) {\n \n activity_period <- case_when(\n \n # creates 1 flag if resident for complete day\n df$date_admit < keydates$keydate[i] & \n df$date_discharge > keydates$keydate[i] ~ 1,\n TRUE ~ 0)\n \n # column bind this day's flags to previous\n activity_all <- bind_cols(activity_all, activity_period)\n \n }\n \n # rename column to match the day being counted\n activity_all <- activity_all |> \n setNames(paste0(\"d_\", keydates$date))\n \n # bind flags columns to patient data\n daily_adm <- bind_cols(df, activity_all) |> \n pivot_longer(\n cols = starts_with(\"d_\"),\n names_to = \"date\",\n values_to = \"count\"\n ) |> \n \n group_by(date) |> \n summarise(resident = sum(count)) |> \n ungroup() |> \n mutate(date = str_remove(date, \"d_\"))\n \n } \n\n\nIs there a better way than using a for loop?\n\nPre-allocate tibbles\nactivity_all will end up as very wide tibble, with a column for each date in list of keydates.\nFor each date in the list of key dates, compares with admission date & discharge date; need to be admitted before the key date and discharged after the key date. If match, flag = 1.\nCreates a column for each day, then binds this to activity all.\nRename each column with the date it was checking (add a character to start of column name so column doesn’t start with numeric)\nPivot long, then group by date and sum the flags (other variables could be added here, such as TFC or provider code)"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames",
+ "title": "Coffee and Coding",
+ "section": "Working with geospatial data frames",
+ "text": "Working with geospatial data frames\nWe can manipulate sf objects like other data frames\n\n\n\nlondon_icbs <- icb_boundaries |>\n filter(ICB23NM |> stringr::str_detect(\"London\"))\n\nggplot() +\n geom_sf(data = london_icbs) +\n geom_sf(data = st_centroid(london_icbs)) +\n theme_void()"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows",
- "title": "System Dynamics in health and care",
- "section": "Longer Time Periods - flows",
- "text": "Longer Time Periods - flows\nUse lubridate::floor_date to generate the date at start of week/month\n\nadmit_wk <- spell_dates |> \n mutate(week_start = floor_date(\n date_admit, unit = \"week\", week_start = 1 # start week on Monday\n )) |> \n count(week_start) # could add other parameters such as provider code, TFC etc\n\nhead(admit_wk)\n\n\n\n# A tibble: 6 × 2\n week_start n\n <date> <int>\n1 2021-12-27 52\n2 2022-01-03 196\n3 2022-01-10 192\n4 2022-01-17 223\n5 2022-01-24 157\n6 2022-01-31 187\n\n\n\nMight run SD model in weeks or months - e.g. months for care homes Use lubridate to create new variable with start date of week/month/year etc"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1",
+ "title": "Coffee and Coding",
+ "section": "Working with geospatial data frames",
+ "text": "Working with geospatial data frames\nSummarising the data will combine the geometries.\n\nlondon_icbs |>\n summarise(area = sum(Shape__Area)) |>\n # and use geospatial functions to create calculations using the geometry\n mutate(new_area = st_area(geometry), .before = \"geometry\")\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -0.5102803 ymin: 51.28676 xmax: 0.3340241 ymax: 51.69188\nGeodetic CRS: WGS 84\n# A tibble: 1 × 3\n area new_area geometry\n* <dbl> [m^2] <MULTIPOLYGON [°]>\n1 1573336388. 1567995610. (((-0.3314819 51.43935, -0.3306676 51.43889, -0.33118…\n\n\n Why the difference in area?\n\n We are using a simplified geometry, so calculating the area will be slightly inaccurate. The original area was calculated on the non-simplified geometries."
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy",
- "title": "System Dynamics in health and care",
- "section": "Longer Time Periods - occupancy",
- "text": "Longer Time Periods - occupancy\nKey dates to include the dates at the start and end of each time period\n\n\n\ndate_start <- dmy(03012022) # first Monday of the year\ndate_end <- dmy(01012023)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"week\"))\n\nkeydates <- data.frame(wk_start = c(seq(date_start, \n by = \"week\", \n length.out=run_len))) |> \n mutate(\n wk_end = wk_start + 6) # last date in time period\n\n\n\n\n wk_start wk_end\n1 2022-01-03 2022-01-09\n2 2022-01-10 2022-01-16\n3 2022-01-17 2022-01-23\n4 2022-01-24 2022-01-30\n5 2022-01-31 2022-02-06\n6 2022-02-07 2022-02-13\n\n\n\n\nModel might make more sense to run in weeks or months (e.g. care home), so list of keydates need a start date and end date for each time period."
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data",
+ "title": "Coffee and Coding",
+ "section": "Creating our own geospatial data",
+ "text": "Creating our own geospatial data\n\nlocation_raw <- postcode_lookup(\"B2 4BJ\")\nglimpse(location_raw)\n\nRows: 1\nColumns: 40\n$ postcode <chr> \"B2 4BJ\"\n$ quality <int> 1\n$ eastings <int> 406866\n$ northings <int> 286775\n$ country <chr> \"England\"\n$ nhs_ha <chr> \"West Midlands\"\n$ longitude <dbl> -1.90033\n$ latitude <dbl> 52.47887\n$ european_electoral_region <chr> \"West Midlands\"\n$ primary_care_trust <chr> \"Heart of Birmingham Teaching\"\n$ region <chr> \"West Midlands\"\n$ lsoa <chr> \"Birmingham 138A\"\n$ msoa <chr> \"Birmingham 138\"\n$ incode <chr> \"4BJ\"\n$ outcode <chr> \"B2\"\n$ parliamentary_constituency <chr> \"Birmingham, Ladywood\"\n$ parliamentary_constituency_2024 <chr> \"Birmingham Ladywood\"\n$ admin_district <chr> \"Birmingham\"\n$ parish <chr> \"Birmingham, unparished area\"\n$ admin_county <lgl> NA\n$ date_of_introduction <chr> \"198001\"\n$ admin_ward <chr> \"Ladywood\"\n$ ced <lgl> NA\n$ ccg <chr> \"NHS Birmingham and Solihull\"\n$ nuts <chr> \"Birmingham\"\n$ pfa <chr> \"West Midlands\"\n$ admin_district_code <chr> \"E08000025\"\n$ admin_county_code <chr> \"E99999999\"\n$ admin_ward_code <chr> \"E05011151\"\n$ parish_code <chr> \"E43000250\"\n$ parliamentary_constituency_code <chr> \"E14000564\"\n$ parliamentary_constituency_2024_code <chr> \"E14001096\"\n$ ccg_code <chr> \"E38000258\"\n$ ccg_id_code <chr> \"15E\"\n$ ced_code <chr> \"E99999999\"\n$ nuts_code <chr> \"TLG31\"\n$ lsoa_code <chr> \"E01033620\"\n$ msoa_code <chr> \"E02006899\"\n$ lau2_code <chr> \"E08000025\"\n$ pfa_code <chr> \"E23000014\"\n\n\n\n\n\nlocation <- location_raw |>\n st_as_sf(coords = c(\"eastings\", \"northings\"), crs = 27700) |>\n select(postcode, ccg) |>\n st_transform(crs = 4326)\n\nlocation\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.900335 ymin: 52.47886 xmax: -1.900335 ymax: 52.47886\nGeodetic CRS: WGS 84\n postcode ccg geometry\n1 B2 4BJ NHS Birmingham and Solihull POINT (-1.900335 52.47886)"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods",
- "title": "System Dynamics in health and care",
- "section": "Longer Time Periods",
- "text": "Longer Time Periods\nMore logic required if working in weeks or months - can only be in one place at any given time\n\n# flag for occupancy\nactivity_period <- case_when(\n \n # creates 1 flag if resident for complete week\n df$date_admit < keydates$wk_start[i] & df$date_discharge > keydates$wk_end[i] ~ 1,\n TRUE ~ 0)\n\n\nAnd a little bit more logic\nOccupancy requires the patient to have been admitted before the start of the week/month, and discharged after the end of the week/month"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts",
+ "title": "Coffee and Coding",
+ "section": "Creating a geospatial data frame for all NHS Trusts",
+ "text": "Creating a geospatial data frame for all NHS Trusts\n\n\n\n# using the NHSRtools package\n# remotes::install_github(\"NHS-R-Community/NHSRtools\")\ntrusts <- ods_get_trusts() |>\n filter(status == \"Active\") |>\n select(name, org_id, post_code) |>\n geocode(postalcode = \"post_code\") |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\ntrusts |>\n leaflet() |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers(popup = ~name)"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data",
- "title": "System Dynamics in health and care",
- "section": "Applying the data",
- "text": "Applying the data\n\n\nHow to apply this wrangling of data to the system dynamic model?\nAdmissions data used as an input to the flow - could be reduced to a single figure (average), or there may be variation by season/day of week etc.\nOccupancy (and discharges) used to verify the model output against known data."
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location",
+ "title": "Coffee and Coding",
+ "section": "What are the nearest trusts to our location?",
+ "text": "What are the nearest trusts to our location?\n\nnearest_trusts <- trusts |>\n mutate(\n distance = st_distance(geometry, location)[, 1]\n ) |>\n arrange(distance) |>\n head(5)\n\nnearest_trusts\n\nSimple feature collection with 5 features and 4 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.9384 ymin: 52.4533 xmax: -1.886282 ymax: 52.48764\nGeodetic CRS: WGS 84\n# A tibble: 5 × 5\n name org_id post_code geometry distance\n <chr> <chr> <chr> <POINT [°]> [m]\n1 BIRMINGHAM WOMEN'S AND CH… RQ3 B4 6NH (-1.894241 52.4849) 789.\n2 BIRMINGHAM AND SOLIHULL M… RXT B1 3RB (-1.917663 52.48416) 1313.\n3 BIRMINGHAM COMMUNITY HEAL… RYW B7 4BN (-1.886282 52.48754) 1356.\n4 SANDWELL AND WEST BIRMING… RXK B18 7QH (-1.930203 52.48764) 2246.\n5 UNIVERSITY HOSPITALS BIRM… RRK B15 2GW (-1.9384 52.4533) 3838."
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps",
- "title": "System Dynamics in health and care",
- "section": "Next Steps",
- "text": "Next Steps\n\nGeneralise function to a state where it can be used by others - onto Github\nTurn this into a package\nOpen-source SD models and interfaces - R Shiny or Python"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts",
+ "title": "Coffee and Coding",
+ "section": "Let’s find driving routes to these trusts",
+ "text": "Let’s find driving routes to these trusts\n\nroutes <- nearest_trusts |>\n mutate(\n route = map(geometry, ~ osrmRoute(location, st_coordinates(.x)))\n ) |>\n st_drop_geometry() |>\n rename(straight_line_distance = distance) |>\n unnest(route) |>\n st_as_sf()\n\nroutes\n\nSimple feature collection with 5 features and 8 fields\nGeometry type: LINESTRING\nDimension: XY\nBounding box: xmin: -1.93846 ymin: 52.45316 xmax: -1.88527 ymax: 52.49279\nGeodetic CRS: WGS 84\n# A tibble: 5 × 9\n name org_id post_code straight_line_distance src dst duration distance\n <chr> <chr> <chr> [m] <chr> <chr> <dbl> <dbl>\n1 BIRMING… RQ3 B4 6NH 789. 1 dst 5.77 3.09\n2 BIRMING… RXT B1 3RB 1313. 1 dst 6.84 4.14\n3 BIRMING… RYW B7 4BN 1356. 1 dst 7.59 4.29\n4 SANDWEL… RXK B18 7QH 2246. 1 dst 8.78 4.95\n5 UNIVERS… RRK B15 2GW 3838. 1 dst 10.6 4.67\n# ℹ 1 more variable: geometry <LINESTRING [°]>"
},
{
- "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions",
- "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions",
- "title": "System Dynamics in health and care",
- "section": "Questions, comments, suggestions?",
- "text": "Questions, comments, suggestions?\n\n\n\nPlease get in touch!\n\nSally.Thompson37@nhs.net\n\n\n\nNHS-R conference 2023"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes",
+ "title": "Coffee and Coding",
+ "section": "Let’s show the routes",
+ "text": "Let’s show the routes\n\nleaflet(routes) |>\n addTiles() |>\n addMarkers(data = location) |>\n addPolylines(color = \"black\", weight = 3, opacity = 1) |>\n addCircleMarkers(data = nearest_trusts, radius = 4, opacity = 1, fillOpacity = 1)"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#why",
- "href": "presentations/2024-05-16_store-data-safely/index.html#why",
- "title": "Store Data Safely",
- "section": "Why?",
- "text": "Why?\nBecause:\n\ndata may be sensitive\nGitHub was designed for source control of code\nGitHub has repository file-size limits\nit makes data independent from code\nit prevents repetition"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones",
+ "title": "Coffee and Coding",
+ "section": "We can use {osrm} to calculate isochrones",
+ "text": "We can use {osrm} to calculate isochrones\n\n\n\niso <- osrmIsochrone(location, breaks = seq(0, 60, 15), res = 10)\n\nisochrone_ids <- unique(iso$id)\n\npal <- colorFactor(\n viridis::viridis(length(isochrone_ids)),\n isochrone_ids\n)\n\nleaflet(location) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~ pal(id),\n color = \"#000000\",\n weight = 1\n )"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#other-approaches",
- "href": "presentations/2024-05-16_store-data-safely/index.html#other-approaches",
- "title": "Store Data Safely",
- "section": "Other approaches",
- "text": "Other approaches\nTo prevent data commits:\n\nuse a .gitignore file (*.csv, etc)\nuse Git hooks\navoid ‘add all’ (git add .) when staging\nensure thorough reviews of (small) pull-requests"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones",
+ "title": "Coffee and Coding",
+ "section": "What trusts are in the isochrones?",
+ "text": "What trusts are in the isochrones?\nThe summarise() function will “union” the geometry\n\nsummarise(iso)\n\nSimple feature collection with 1 feature and 0 fields\nGeometry type: POLYGON\nDimension: XY\nBounding box: xmin: -2.913575 ymin: 51.98062 xmax: -0.8502164 ymax: 53.1084\nGeodetic CRS: WGS 84\n geometry\n1 POLYGON ((-1.541014 52.9693..."
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data",
- "href": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data",
- "title": "Store Data Safely",
- "section": "What if I committed data?",
- "text": "What if I committed data?\n‘It depends’, but if it’s sensitive:\n\n‘undo’ the commit with git reset\nuse a tool like BFG to expunge the file from Git history\ndelete the repo and restart 🔥\n\nA data security breach may have to be reported."
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1",
+ "title": "Coffee and Coding",
+ "section": "What trusts are in the isochrones?",
+ "text": "What trusts are in the isochrones?\nWe can use this with a geo-filter to find the trusts in the isochrone\n\n# also works\ntrusts_in_iso <- trusts |>\n st_filter(\n summarise(iso),\n .predicate = st_within\n )\n\ntrusts_in_iso\n\nSimple feature collection with 31 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -2.793386 ymin: 52.19205 xmax: -1.10302 ymax: 53.01015\nGeodetic CRS: WGS 84\n# A tibble: 31 × 4\n name org_id post_code geometry\n * <chr> <chr> <chr> <POINT [°]>\n 1 BIRMINGHAM AND SOLIHULL MENTAL HE… RXT B1 3RB (-1.917663 52.48416)\n 2 BIRMINGHAM COMMUNITY HEALTHCARE N… RYW B7 4BN (-1.886282 52.48754)\n 3 BIRMINGHAM WOMEN'S AND CHILDREN'S… RQ3 B4 6NH (-1.894241 52.4849)\n 4 BIRMINGHAM WOMEN'S NHS FOUNDATION… RLU B15 2TG (-1.942861 52.45325)\n 5 BURTON HOSPITALS NHS FOUNDATION T… RJF DE13 0RB (-1.656667 52.81774)\n 6 COVENTRY AND WARWICKSHIRE PARTNER… RYG CV6 6NY (-1.48692 52.45659)\n 7 DERBYSHIRE HEALTHCARE NHS FOUNDAT… RXM DE22 3LZ (-1.512896 52.91831)\n 8 DUDLEY INTEGRATED HEALTH AND CARE… RYK DY5 1RU (-2.11786 52.48176)\n 9 GEORGE ELIOT HOSPITAL NHS TRUST RLT CV10 7DJ (-1.47844 52.51258)\n10 HEART OF ENGLAND NHS FOUNDATION T… RR1 B9 5ST (-1.828759 52.4781)\n# ℹ 21 more rows"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions",
- "href": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions",
- "title": "Store Data Safely",
- "section": "Data-hosting solutions",
- "text": "Data-hosting solutions\nWe’ll talk about two main options for The Strategy Unit:\n\nPosit Connect and the {pins} package\nAzure Data Storage\n\nWhich to use? It depends."
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2",
+ "title": "Coffee and Coding",
+ "section": "What trusts are in the isochrones?",
+ "text": "What trusts are in the isochrones?\n\n\n\nleaflet(trusts_in_iso) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~pal(id),\n color = \"#000000\",\n weight = 1\n )"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit",
- "href": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit",
- "title": "Store Data Safely",
- "section": "A platform by Posit",
- "text": "A platform by Posit\n\n\nhttps://connect.strategyunitwm.nhs.uk/"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius",
+ "title": "Coffee and Coding",
+ "section": "Doing the same but within a radius",
+ "text": "Doing the same but within a radius\n\n\n\nr <- 25000\n\ntrusts_in_radius <- trusts |>\n st_filter(\n location,\n .predicate = st_is_within_distance,\n dist = r\n )\n\n# transforming gives us a pretty smooth circle\nradius <- location |>\n st_transform(crs = 27700) |>\n st_buffer(dist = r) |>\n st_transform(crs = 4326)\n\nleaflet(trusts_in_radius) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = radius,\n color = \"#000000\",\n weight = 1\n )"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit",
- "href": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit",
- "title": "Store Data Safely",
- "section": "A package by Posit",
- "text": "A package by Posit\n\n\nhttps://pins.rstudio.com/"
+ "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading",
+ "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading",
+ "title": "Coffee and Coding",
+ "section": "Further reading",
+ "text": "Further reading\n\nGeocomputation with R\nr-spatial\n{sf} documentation\nLeaflet documentation\nTidy Geospatial Networks in R\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#basic-approach",
- "href": "presentations/2024-05-16_store-data-safely/index.html#basic-approach",
- "title": "Store Data Safely",
- "section": "Basic approach",
- "text": "Basic approach\ninstall.packages(\"pins\")\nlibrary(pins)\n\nboard_connect()\npin_write(board, data, \"pin_name\")\npin_read(board, \"user_name/pin_name\")"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing",
+ "title": "Unit testing in R",
+ "section": "What is testing?",
+ "text": "What is testing?\n\nSoftware testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation\nwikipedia"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#live-demo",
- "href": "presentations/2024-05-16_store-data-safely/index.html#live-demo",
- "title": "Store Data Safely",
- "section": "Live demo",
- "text": "Live demo\n\nLink RStudio to Posit Connect (authenticate)\nConnect to the board\nWrite a new pin\nCheck pin status and details\nPin versions\nUse pinned data\nUnpin your pin"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code",
+ "title": "Unit testing in R",
+ "section": "How can we test our code?",
+ "text": "How can we test our code?\n\n\nStatically\n\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\n\nDynamically"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it",
- "href": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it",
- "title": "Store Data Safely",
- "section": "Should I use it?",
- "text": "Should I use it?\n\n\n⚠️ {pins} is not great because:\n\nyou should not upload sensitive data!\nthere’s a file-size upload limit\npin organisation is a bit awkward (no subfolders)\n\n\n{pins} is helpful because:\n\nauthentication is straightforward\ndata can be versioned\nyou can control permissions\nthere are R and Python versions of the package"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1",
+ "title": "Unit testing in R",
+ "section": "How can we test our code?",
+ "text": "How can we test our code?\n\n\nStatically\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\nDynamically\n\n\n(by executing the code)\nsplit into functional and non-functional testing\ntesting can be manual, or automated\n\n\n\n\n\nnon-functional testing covers things like performance, security, and usability testing"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage",
- "href": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage",
- "title": "Store Data Safely",
- "section": "What is Azure Data Storage?",
- "text": "What is Azure Data Storage?\nMicrosoft cloud storage for unstructured data or ‘blobs’ (Binary Large Objects): data objects in binary form that do not necessarily conform to any file format.\nHow is it different?\n\nNo hierarchy – although you can make pseudo-‘folders’ with the blobnames.\nAuthenticates with your Microsoft account."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests",
+ "title": "Unit testing in R",
+ "section": "Different types of functional tests",
+ "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\n\n\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements."
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage",
- "href": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage",
- "title": "Store Data Safely",
- "section": "Authenticating to Azure Data Storage",
- "text": "Authenticating to Azure Data Storage\n\nYou are all part of the “strategy-unit-analysts” group; this gives you read/write access to specific Azure storage containers.\nYou can store sensitive information like the container ID in a local .Renviron or .env file that should be ignored by git.\nUsing {AzureAuth}, {AzureStor} and your credentials, you can connect to the Azure storage container, upload files and download them, or read the files directly from storage!"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1",
+ "title": "Unit testing in R",
+ "section": "Different types of functional tests",
+ "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nUnit, Integration, and E2E testing are all things we can automate in code, whereas UAT testing is going to be manual"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables",
- "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables",
- "title": "Store Data Safely",
- "section": "Step 1: load your environment variables",
- "text": "Step 1: load your environment variables\nStore sensitive info in an .Renviron file that’s kept out of your Git history! The info can then be loaded in your script.\n.Renviron:\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nScript:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\nTip: reload .Renviron with readRenviron(\".Renviron\")"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2",
+ "title": "Unit testing in R",
+ "section": "Different types of functional tests",
+ "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nOnly focussing on unit testing in this talk, but the techniques/packages could be extended to integration testing. Often other tools (potentially specific tools) are needed for E2E testing."
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1",
- "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1",
- "title": "Store Data Safely",
- "section": "Step 1: load your environment variables",
- "text": "Step 1: load your environment variables\nIn the demo script we are providing, you will need these environment variables:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example",
+ "title": "Unit testing in R",
+ "section": "Example",
+ "text": "Example\nWe have a {shiny} app which grabs some data from a database, manipulates the data, and generates a plot.\n\n\nwe would write unit tests to check the data manipulation and plot functions work correctly (with pre-created sample/simple datasets)\nwe would write integration tests to check that the data manipulation function works with the plot function (with similar data to what we used for the unit tests)\nwe would write e2e tests to ensure that from start to finish the app grabs the data and produces a plot as required\n\n\n\nsimple (unit tests) to complex (e2e tests)"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure",
- "href": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure",
- "title": "Store Data Safely",
- "section": "Step 2: Authenticate with Azure",
- "text": "Step 2: Authenticate with Azure\n\n\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\nThe first time you do this, you will have link to authenticate in your browser and a code in your terminal to enter. Use the browser that works best with your @mlcsu.nhs.uk account!"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid",
+ "title": "Unit testing in R",
+ "section": "Testing Pyramid",
+ "text": "Testing Pyramid\n\n\nImage source: The Testing Pyramid: Simplified for One and All headspin.io"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container",
- "href": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container",
- "title": "Store Data Safely",
- "section": "Step 3: Connect to container",
- "text": "Step 3: Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\nIf you get 403 error, delete your token and re-authenticate, try a different browser/incognito, etc.\nTo clear Azure tokens: AzureAuth::clean_token_directory()"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function",
+ "title": "Unit testing in R",
+ "section": "Let’s create a simple function…",
+ "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container",
- "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container",
- "title": "Store Data Safely",
- "section": "Interact with the container",
- "text": "Interact with the container\nIt’s possible to interact with the container via your browser!\nYou can upload and download files using the Graphical User Interface (GUI), login with your @mlcsu.nhs.uk account: https://portal.azure.com/#home\nAlthough it’s also cooler to interact via code… 😎"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1",
+ "title": "Unit testing in R",
+ "section": "Let’s create a simple function…",
+ "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1",
- "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1",
- "title": "Store Data Safely",
- "section": "Interact with the container",
- "text": "Interact with the container\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(\n container,\n \"LOCAL_FOLDERNAME/*\",\n \"FOLDERNAME_ON_AZURE\"\n)\n\n# Upload specific file to container\nAzureStor::storage_upload(\n container,\n \"data/ronald.jpeg\",\n \"newdir/ronald.jpeg\"\n)"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2",
+ "title": "Unit testing in R",
+ "section": "Let’s create a simple function…",
+ "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}\n\n\nThe Ten Rules of Defensive Programming in R"
},
- {
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container",
- "href": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container",
- "title": "Store Data Safely",
- "section": "Load csv files directly from Azure container",
- "text": "Load csv files directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by storing it in memory)\n\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\n\nparq_df <- arrow::read_parquet(parquet_in_memory)"
+ {
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test",
+ "title": "Unit testing in R",
+ "section": "… and create our first test",
+ "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2",
- "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2",
- "title": "Store Data Safely",
- "section": "Interact with the container",
- "text": "Interact with the container\n# Delete from Azure container (!!!)\nAzureStor::delete_storage_file(container, BLOB_NAME)"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1",
+ "title": "Unit testing in R",
+ "section": "… and create our first test",
+ "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
},
{
- "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve",
- "href": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve",
- "title": "Store Data Safely",
- "section": "What does this achieve?",
- "text": "What does this achieve?\n\nData is not in the repository, it is instead stored in a secure location\nCode can be open – sensitive information like Azure container name stored as environment variables\nLarge filesizes possible, other people can also access the same container.\nNaming conventions can help to keep blobs organised (these create pseudo-folders)\n\n\n\n\nLearn more about Data Science at The Strategy Unit"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2",
+ "title": "Unit testing in R",
+ "section": "… and create our first test",
+ "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
},
{
- "objectID": "blogs/posts/2023-04-26_alternative_remotes.html",
- "href": "blogs/posts/2023-04-26_alternative_remotes.html",
- "title": "Alternative remote repositories",
- "section": "",
- "text": "It’s great when someone send’s you a pull request on GitHub to fix bugs or add new features to your project, but you probably always want to check the other persons work in someway before merging that pull request.\nAll of the steps below are intended to be entered via a terminal.\nLet’s imagine that we have a GitHub account called example and a repository called test, and we use https rather than ssh.\n$ git remote get-url origin\n# https://github.com/example/test.git\nNow, let’s say we have someone who has submitted a Pull Request (PR), and their username is friend. We can add a new remote for their fork with\n$ git remote add friend https://github.com/friend/test.git\nHere, I name the remote exactly as per the persons GitHub username for no other reason than making it easier to track things later on. You could name this remote whatever you like, but you will need to make sure that the remote url matches their repository correctly.\nWe are now able to checkout their remote branch. First, we will want to fetch their work:\n# make sure to replace the remote name to what you set it to before\n$ git fetch friend\nNow, hopefully they have commited to a branch with a name that you haven’t used. Let’s say they created a branch called my_work. You can then simply run\n$ git switch friend/my_work\nThis should checkout the my_work branch locally for you.\nNow, if they have happened to use a branch name that you are already using, or more likely, directly commited to their own main branch, you will need to do checkout to a new branch:\n# replace friend as above to be the name of the remote, and main to be the branch\n# that they have used\n# replace their_work with whatever you want to call this branch locally\n$ git checkout friend/main -b their_work\nYou are now ready to run their code and check everything is good to merge!\nFinally, If you want to clean up your local repository you can remove the new branch that you checked out and the new remote with the following steps:\n# switch back to one of your branches, e.g. main\n$ git checkout main\n\n# then remove the branch that you created above\n$ git branch -D their_work\n\n# you can remove the remote\n$ git remote remove friend"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3",
+ "title": "Unit testing in R",
+ "section": "… and create our first test",
+ "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
},
{
- "objectID": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html",
- "href": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html",
- "title": "Advent of Code and Test Driven Development",
- "section": "",
- "text": "Advent of Code is an annual event, where daily coding puzzles are released from 1st – 24th December. We ran one of our fortnightly Coffee & Coding sessions introducing Advent of Code to people who code in the Strategy Unit, as well as the concept of test-driven development as a potential way of approaching the puzzles.\nTest-driven development (TDD) is an approach to coding which involves writing the test for a function BEFORE we write the function. This might seem quite counterintuitive, but it makes it easier to identify bugs 🐛 when they are introduced to our code, and ensures that our functions meet all necessary criteria. From my experience, this takes quite a long time to implement and can be quite tedious, but it is definitely worth it overall, especially as your project develops. Testing is also recommended in the NHS Reproducible Analytical Pipeline (RAP) guidelines.\nAn interesting thing to note about TDD is that we’re always expecting our first test to fail, and indeed failing tests are useful and important! If we wrote tests that just passed all the time, this would not be useful at all for our code.\nThe way that Advent of Code is structured, with test data for each puzzle and an expected test result, makes it very amenable to a test-driven approach. In order to support this, Matt and I created template repositories for a test-driven approach to Advent of Code, in Python and in R.\nOur goal when setting this up was to introduce others in the Strategy Unit to both TDD and Advent of Code. Advent of code can be challenging and I personally struggle to get past the first week, but it encourages creative (and maybe even fun?!) approaches to coding problems. I’m glad that we had the chance to explore some of the puzzles together in Coffee & Coding – it was interesting to see so many different approaches to the same problem, and hopefully it also gave us all the chance to practice writing tests."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4",
+ "title": "Unit testing in R",
+ "section": "… and create our first test",
+ "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
},
{
- "objectID": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html",
- "href": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html",
- "title": "Data Science @ The Strategy Unit",
- "section": "",
- "text": "import os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5",
+ "title": "Unit testing in R",
+ "section": "… and create our first test",
+ "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})\n\nTest passed 😸"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html",
- "title": "RStudio Tips and Tricks",
- "section": "",
- "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions",
+ "title": "Unit testing in R",
+ "section": "other expect_*() functions…",
+ "text": "other expect_*() functions…\n\ntest_that(\"my_function correctly divides values\", {\n expect_lt(\n my_function(4, 2),\n 10\n )\n expect_gt(\n my_function(1, 4),\n 0.2\n )\n expect_length(\n my_function(c(4, 1), c(2, 4)),\n 2\n )\n})\n\nTest passed 🎉\n\n\n\n{testthat} documentation"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding",
- "title": "RStudio Tips and Tricks",
- "section": "",
- "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert",
+ "title": "Unit testing in R",
+ "section": "Arrange, Act, Assert",
+ "text": "Arrange, Act, Assert\n\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n # \n #\n #\n\n # act\n #\n\n # assert\n #\n})"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance",
- "title": "RStudio Tips and Tricks",
- "section": "Official guidance",
- "text": "Official guidance\nPosit is the company who build and maintain RStudio. They host a number of cheatsheets on their website, including one for RStudio. They also have a more in-depth user guide."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1",
+ "title": "Unit testing in R",
+ "section": "Arrange, Act, Assert",
+ "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\n\n\nto create sample values\ncreate fake/temporary files\nset random seed\nset R options/environment variables\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n #\n\n # assert\n #\n})"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette",
- "title": "RStudio Tips and Tricks",
- "section": "Command palette",
- "text": "Command palette\nRStudio has a powerful built-in Command Palette, which is a special search box that gives instant access to features and settings without needing to find them in the menus. Many of the tips and tricks we discussed can be found by searching in the Palette. Open it with the keyboard shortcut Ctrl + Shift + P.\n\n\n\nOpening the Command Palette.\n\n\nFor example, let’s say you forgot how to restart R. If you open the Command Palette and start typing ‘restart’, you’ll see the option ‘Restart R Session’. Clicking it will do exactly that. Handily, the Palette also displays the keyboard shortcut (Control + Shift + F10 on Windows) as a reminder.\nAs for settings, a search for ‘rainbow’ in the Command Palette will find ‘Use rainbow parentheses’, an option to help prevent bracket-mismatch errors by colouring pairs of parentheses. What’s nice is that the checkbox to toggle the feature appears right there in the palette so you can change it immediately.\nI refer to menu paths and keyboard shortcuts in the rest of this post, but bear in mind that you can use the Command Palette instead."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2",
+ "title": "Unit testing in R",
+ "section": "Arrange, Act, Assert",
+ "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n #\n})"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#options",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#options",
- "title": "RStudio Tips and Tricks",
- "section": "Options",
- "text": "Options\nIn general, most settings can be found under Tools > Global Options… and many of these are discussed in the rest of this post.\n\n\n\nAdjusting workspace and history settings.\n\n\nBut there’s a few settings in particular that we recommend you change to help maximise reproducibility and reduce the chance of confusion. Under General > Basic, uncheck ‘Restore .Rdata into workspace at startup’ and select ‘Never’ from the dropdown options next to ‘Save workspace to .Rdata on exit’. These options mean you start with the ‘blank slate’ of an empty environment when you open a project, allowing you to rebuild objects from scratch1."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3",
+ "title": "Unit testing in R",
+ "section": "Arrange, Act, Assert",
+ "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\nwe assert that the actual results match our expected results\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts",
- "title": "RStudio Tips and Tricks",
- "section": "Keyboard shortcuts",
- "text": "Keyboard shortcuts\nYou can speed up day-to-day coding with keyboard shortcuts instead of clicking buttons in the interface.\nYou can see some available shortcuts in RStudio if you navigate to Help > Keyboard Shortcuts Help, or use the shortcut Alt + Shift + K (how meta). You can go to Help > Modify Keyboard Shortcuts… to search all shortcuts and change them to what you prefer2.\nWe discussed a number of handy shortcuts that we use frequently3. You can:\n\nre-indent lines to the appropriate depth with Control + I\nreformat code with Control + Shift + A\nturn one or more lines into a comment with Control + Shift + C\ninsert the pipe operator (%>% or |>4) with Control + Shift + M5\ninsert the assignment arrow (<-) with Alt + - (hyphen)\nhighlight a function in the script or console and press F1 to open the function documentation in the ‘Help’ pane\nuse ‘Find in Files’ to search for a particular variable, function or string across all the files in your project, with Control + Shift + F"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed",
+ "title": "Unit testing in R",
+ "section": "Our test failed!?! 😢",
+ "text": "Our test failed!?! 😢\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})\n\n── Failure: my_function works ──────────────────────────────────────────────────\n`actual` not equal to `expected`.\n1/1 mismatches\n[1] 0.714 - 0.714 == 7.14e-07\n\n\nError:\n! Test failed"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes",
- "title": "RStudio Tips and Tricks",
- "section": "Themes",
- "text": "Themes\nYou can change a number of settings to alter RStudio’s theme, colours and fonts to whatever you desire.\nYou can change the default theme in Tools > Global Options… > Appearance > Editor theme and select one from the pre-installed list. You can upload new themes by clicking the ‘Add’ button and selecting a theme from your computer. They typically have the file extension .rsthemes and can be downloaded from the web, or you can create or tweak one yourself. The {rsthemes} package has a number of options and also allows you to switch between themes and automatically switch between light and dark themes depending on the time of day.\n\n\n\nCustomising the appearance and font.\n\n\nIn the same ‘Appearance’ submenu as the theme settings, you can find an option to change fonts. Monospace fonts, ones where each character takes up the same width, will appear here automatically if you’ve installed them on your computer. One popular font for coding is Fira Code, which has the special property of converting certain sets of characters into ‘ligatures’, which some people find easier to read. For example, the base pipe will appear as a rightward-pointing arrow rather than its constituent vertical-pipe and greater-than symbol (|>)."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue",
+ "title": "Unit testing in R",
+ "section": "Tolerance to the rescue 🙂",
+ "text": "Tolerance to the rescue 🙂\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected, tolerance = 1e-6)\n})\n\nTest passed 🎊\n\n\n\n(this is a slightly artificial example, usually the default tolerance is good enough)"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes",
- "title": "RStudio Tips and Tricks",
- "section": "Panes",
- "text": "Panes\n\nLayout\nThe structural layout of RStudio’s panes can be adjusted. One simple thing you can do is minimise and maximise each pane by clicking the window icons in their upper-right corners. This is useful when you want more screen real-estate for a particular pane.\nYou can move pane loations too. Click the ‘Workspace Panes’ button (a square with four more inside it) at the top of the IDE to see a number of settings. For example, you can select ‘Console on the right’ to move the R console to the upper-right pane, which you may prefer for maximimsing the vertical space in which code is shown. You could also click Pane Layout… in this menu to be taken to Tools > Global Options… > Pane layout, where you can click ‘Add Column’ to insert new script panes that allow you to inspect and write multiple files side-by-side.\n\n\nScript navigation\nThe script pane in particular has a nice feature for navigating through sections of your script or Quarto/R Markdown files. Click the ‘Show Document Outline’ button or use the keyboard shortcut Control + Shift + O to slide open a tray that provides a nice indented list of all the sections and function defintions in your file.\nSection headers are auto-detected in a Quarto or R Markdown document wherever the Markdown header markup has been used: one hashmark (#) for a level 1 header, two for level 2, and so on. To add section headers to an R Script, add at least four hyphens after a commented line that starts with #. Use two or more hashes at the start of the comment to increase the nestedness of that section.\n\n# Header ------------------------------------------------------------------\n\n## Section ----\n\n### Subsection ----\n\nNote that Ctrl + Shift + R will open a dialog box for you to input the name of a section header, which will be inserted and automatically padded to 75 characters to provide a strong visual cue between sections.\nAs well as the document outline, there’s also a reminder in the lower-left of the script pane that gives the name of the section that your cursor is currently in. A symbol is also shown: a hashmark means it’s a headed section and an ‘f’ means it’s a function definition. You can click this to jump to other sections.\n\n\n\nNavigating with headers in the R script pane.\n\n\n\n\nBackground jobs\nPerhaps an under-used pane is ‘Background jobs’. This is where you can run a separate R process that keeps your R console free. Go to Tools > Background Jobs > Start Background Job… to expose this tab if it isn’t already listed alongside the R console.\nWhy might you want to do this? As I write this post, there’s a background process to detect changes to the Quarto document that I’m writing and then update a preview I have running in the browser. You can do something similar for Shiny apps. You can continue to develop your app and test things in the console and the app preview will update on save. You won’t need to keep hitting the ‘Render’ or ‘Run app’ button every time you make a change."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases",
+ "title": "Unit testing in R",
+ "section": "Testing edge cases",
+ "text": "Testing edge cases\n\n\nRemember the validation steps we built into our function to handle edge cases?\n\nLet’s write tests for these edge cases:\nwe expect errors\n\n\ntest_that(\"my_function works\", {\n expect_error(my_function(5, 0))\n expect_error(my_function(\"a\", 3))\n expect_error(my_function(3, \"a\"))\n expect_error(my_function(1:2, 4))\n})\n\nTest passed 🎊"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand",
- "title": "RStudio Tips and Tricks",
- "section": "Magic wand",
- "text": "Magic wand\nThere’s a miscellany of useful tools available when you click the ‘magic wand’ button in the script pane.\n\n\n\nAbracadabra! Casting open the ‘magic wand’ menu.\n\n\nThis includes:\n\n‘Rename in Scope’, which is like find-and-replace but you only change instances with the same ‘scope’, so you could select the variable x, go to Rename in Scope and then you can edit all instances of the variable in the document and change them at the same time (e.g. to rename them)\n‘Reflow Comment’, which you can click after higlighting a comments block to have the comments automatically line-break at the maximum width\n‘Insert Roxygen Skeleton’, which you can click when your cursor is inside the body of a function you’ve written and a {roxygen2} documentation template will be added above your function with the @params argument names pre-filled\n\nAlong with ‘Comment/Uncomment Lines’, ‘Reindent Lines’ and ‘Reformat Lines’, mentioned above in the keyboard shortcuts section."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example",
+ "title": "Unit testing in R",
+ "section": "Another (simple) example",
+ "text": "Another (simple) example\n\n\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\nConsider this function - there is branched logic, so we need to carefully design tests to validate the logic works as intended."
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up",
- "title": "RStudio Tips and Tricks",
- "section": "Wrapping up",
- "text": "Wrapping up\nTime was limited in our discussion. There are so many more tips and tricks that we didn’t get to. Let us know what we missed, or what your favourite shortcuts and settings are."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1",
+ "title": "Unit testing in R",
+ "section": "Another (simple) example",
+ "text": "Another (simple) example\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\n\ntest_that(\"it returns 'x' if x is bigger than y\", {\n expect_equal(my_new_function(4, 3), \"x\")\n})\n\nTest passed 🎉\n\ntest_that(\"it returns 'y' if y is bigger than x\", {\n expect_equal(my_new_function(3, 4), \"y\")\n expect_equal(my_new_function(3, 3), \"y\")\n})\n\nTest passed 🥳"
},
{
- "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes",
- "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes",
- "title": "RStudio Tips and Tricks",
- "section": "Footnotes",
- "text": "Footnotes\n\n\nFor the same reason it’s a good idea to restart R on a frequent basis. You may assume that an object x in your environment was made in a certain way and contains certain information, but does it? What if you overwrote it at some point and forgot? Best to wipe the slate clean and rebuild it from scratch. Jenny Bryan has written an explainer.↩︎\nYou can ‘snap focus’ to the script and console panes with the pre-existing shortcuts Control + 1 and Control + 2. My next most-used pane is the terminal, so I’ve re-mapped the shortcut to Control + 3.↩︎\nThe classic shortcuts of select-all (Control + A), cut (Control + X), copy Control + C, paste (Control + V), undo (Control + Z) and redo (Control + Shift + Z) are all available when editing.↩︎\nNote that you can set the default pipe to the base-R version (|>) by checking the box at Tools > Global Options… > Code > Use native pipe operator↩︎\nProbably ‘M’ for {magrittr}, the name of the package that contains the %>% incarnation of the operator.↩︎"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests",
+ "title": "Unit testing in R",
+ "section": "How to design good tests",
+ "text": "How to design good tests\na non-exhaustive list\n\nconsider all the functions arguments,\nwhat are the expected values for these arguments?\nwhat are unexpected values, and are they handled?\nare there edge cases that need to be handled?\nhave you covered all of the different paths in your code?\nhave you managed to create tests that check the range of results you expect?"
},
{
- "objectID": "blogs/posts/2023-04-26-reinstalling-r-packages.html",
- "href": "blogs/posts/2023-04-26-reinstalling-r-packages.html",
- "title": "Reinstalling R Packages",
- "section": "",
- "text": "R 4.3.0 was released last week. Anytime you update R you will probably find yourself in the position where no packages are installed. This is by design - the packages that you have installed may need to be updated and recompiled to work under new versions of R.\nYou may find yourself wanting to have all of the packages that you previously used, so one approach that some people take is to copy the previous library folder to the new versions folder. This isn’t a good idea and could potentially break your R install.\nAnother approach would be to export the list of packages in R before updating and then using that list after you have updated R. This can cause issues though if you install from places other than CRAN, e.g. bioconductor, or from GitHub.\nSome of these approaches are discussed on the RStudio Community Forum. But I prefer an approach of having a “spring clean”, instead only installing the packages that I know that I need.\nI maintain a list of the packages that I used as a gist. Using this, I can then simply run this script on any new R install. In fact, if you click the “raw” button on the gist, and copy that url, you can simply run\nsource(\"https://gist.githubusercontent.com/tomjemmett/c105d3e0fbea7558088f68c65e68e1ed/raw/a1db4b5fa0d24562d16d3f57fe8c25fb0d8aa53e/setup.R\")\nGenerally, sourcing a url is a bad idea - the reason for this is if it’s not a link that you control, then someone could update the contents and run arbritary code on your machine. In this case, I’m happy to run this as it’s my own gist, but you should be mindful if running it yourself!\nIf you look at the script I first install a number of packages from CRAN, then I install packages that only exist on GitHub."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests",
+ "title": "Unit testing in R",
+ "section": "But, why create tests?",
+ "text": "But, why create tests?\nanother non-exhaustive list\n\ngood tests will help you uncover existing issues in your code\nwill defend you from future changes that break existing functionality\nwill alert you to changes in dependencies that may have changed the functionality of your code\ncan act as documentation for other developers"
},
{
- "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html",
- "href": "blogs/posts/2024-08-08-map-and-nest/index.html",
- "title": "Map and Nest",
- "section": "",
- "text": "I want to share a framework that I like using occasionally for data analysis. It’s the nest-and-map and it’s helped me countless times when I’m working with related datasets. By combining {purrr} mapping with {tidyr} nesting, I can keep my analysis steps linked, allowing me to easily track from a summary or plot, back to the original data.\nThe main funtions we’ll need are"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions",
+ "title": "Unit testing in R",
+ "section": "Testing complex functions",
+ "text": "Testing complex functions\n\n\n\nmy_big_function <- function(type) {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n df <- tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n\n conditions <- read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date) |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}\n\n\nWhere do you even begin to start writing tests for something so complex?\n\n\nNote: to get the code on the left to fit on one page, I skipped including a few library calls\n\nlibrary(tidyverse)\nlibrary(DBI)"
},
{
- "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html#example-on-nhs-workforce-statistics",
- "href": "blogs/posts/2024-08-08-map-and-nest/index.html#example-on-nhs-workforce-statistics",
- "title": "Map and Nest",
- "section": "Example on NHS workforce statistics",
- "text": "Example on NHS workforce statistics\nThe NHS workforce statistics are official statistics published monthly for England.\n\nstaff_group <- readRDS(file = \"workforce_staff_group.rds\")\n\nI want to perform an analysis for each of the 42 integrated care systems (ICS). The {tidyr} nest() function creates a list-column, where each cell contains a mini dataframe for each grouping.\nLet’s group by ICS, and call the nested data column raw_data.\n\ngroup_by_ics <- staff_group |>\n tidyr::nest(raw_data = -ics_name)\n\nThe new column is a list-column, with each cell containing an entire tibble of data relating to that individual ICS.\n\n#' echo: false\nhead(group_by_ics)\n\n# A tibble: 6 × 2\n ics_name raw_data \n <chr> <list> \n1 South East London <tibble [8 × 6]> \n2 North East London <tibble [7 × 6]> \n3 North Central London <tibble [12 × 6]>\n4 North West London <tibble [10 × 6]>\n5 South West London <tibble [8 × 6]> \n6 Devon <tibble [7 × 6]> \n\n\nWe can grab these mini datasets in the usual way and explore them interactively.\n\ngroup_by_ics$raw_data[[1]]\n\n# A tibble: 8 × 6\n organisation_name total hchs_doctors nurses_health_visitors midwives\n <chr> <dbl> <dbl> <dbl> <dbl>\n1 Total 58394 7108 14939 926\n2 Guy's and St Thomas' NHS F… 21361 3003 6196 281\n3 King's College Hospital NH… 13158 2443 4202 375\n4 Lewisham and Greenwich NHS… 6617 979 2103 271\n5 London Ambulance Service N… 7050 4 44 0\n6 NHS South East London ICB 617 9 43 0\n7 Oxleas NHS Foundation Trust 4094 200 1196 0\n8 South London and Maudsley … 5496 471 1155 0\n# ℹ 1 more variable: ambulance_staff <dbl>\n\n\nNext, let’s apply some simple processing, say converting absolute numbers into percentages, to each of the ICSs in turn.\nWe use mutate() to create a new list-column staff_percent and map() to apply the processing function to each cell in turn. 1\n\n\nSee function definition for convert_percent()\n\n\n#' Convert percent\n#' @param raw_staff Tibble containing organisation_name, total and a number of staff categories\n#' @return Tibble like raw_staff but with staff categories represented as percentages rather than absolute numbers\nconvert_percent <- function(staff){\n staff |>\n dplyr::mutate(dplyr::across(.cols = -c(organisation_name, total),\n .fns = \\(x)x/total)) |>\n dplyr::rename(\"Doctors\" = \"hchs_doctors\",\n \"Nurses\" = \"nurses_health_visitors\",\n \"Ambulance staff\" = \"ambulance_staff\",\n \"Midwives\" = \"midwives\")\n}\n\n\n\nprocessed_staff <-\ngroup_by_ics |>\n dplyr::mutate(\n staff_percent = purrr::map(raw_data, convert_percent)\n )\n\nWhere I think this map-and-nest process really comes into its own is creating plots. Often, I find myself wanting to create a couple of different plots for each grouping, and then optionally save the plots with sensible names. Particularly in the analysis stage, I like having these plots in the same row as the raw data, so I can quickly compare and validate.\nI’ve created two functions, plot_barchart() and plot_waffle() which take the data and create charts.\n\n\nSee definition for plot_barchart() & plot_waffle()\n\n\n#' Plot barchart\n#' Makes a bar chart of staff perentages by organisation\n#' @param df tibble of staff data in percent format\nplot_barchart <- function(df) {\n df |>\n dplyr::filter(organisation_name != \"Total\") |>\n dplyr::select(-total) |>\n tidyr::pivot_longer(cols = -c(organisation_name), names_to = \"job\", values_to = \"percent\") |>\n ggplot2::ggplot(ggplot2::aes(x = percent, y = organisation_name, fill = job)) +\n ggplot2::geom_col(position = \"dodge\") + \n ggplot2::scale_x_continuous(labels = scales::percent_format(scale = 100)) +\n ggplot2::labs(x = \"\", y = \"\") +\n StrategyUnitTheme::scale_fill_su() + \n ggplot2::theme_minimal() + \n ggplot2::theme(legend.title = ggplot2::element_blank())\n}\n\n#' Plot waffle\n#' Makes a waffle chart to visualise staff breakdown at an ICS level\n#' @param raw_staff count data of staff\n#' @param title Title for the graphic\nplot_waffle <- function(raw_staff, title) {\nwaffle_data <-\nraw_staff |>\n dplyr::filter(organisation_name == \"Total\") |>\n dplyr::select(-total, -organisation_name) |>\n tidyr::pivot_longer(cols = dplyr::everything(), names_to = \"names\", values_to = \"vals\") |>\n dplyr::mutate(vals = round(vals / 100))\n\nggplot2::ggplot(waffle_data, ggplot2::aes(fill = names, values = vals)) +\n waffle::geom_waffle(n_rows = 8, size = 0.33, colour = \"white\") +\n ggplot2::coord_equal() +\n ggplot2::theme_void() + \n ggplot2::theme(legend.title = ggplot2::element_blank()) +\n ggplot2::ggtitle(title)\n}\n\n\nAgain, using mutate() I can create a new column called barchart and I can map() the function plot_barchart(), applying it to each row at a time.\n\ngraphs <-\nprocessed_staff |>\n dplyr::mutate(\n barchart = purrr::map(staff_percent, plot_barchart)\n ) \n\nThe resulting column barchart is again a list-column, but this time instead of containing a tibble, it holds a ggplot object. A whole ggplot in a single cell. 2\nIf we want to pass two arguments to our function, we can replace map() with map2(). Here we’re using map2() to pass the ics_name column to use as a title in our waffle plot. 3\n\ngraphs <-\nprocessed_staff |>\n dplyr::mutate(\n waffle = purrr::map2(raw_data, ics_name, \n \\(data, title) plot_waffle(data, title)\n )\n ) \n\n\n\n\nAn example bar chart plot"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions",
+ "title": "Unit testing in R",
+ "section": "Split the logic into smaller functions",
+ "text": "Split the logic into smaller functions\nFunction to get the data from the database\n\nget_data_from_sql <- function() {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n}"
},
{
- "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html#putting-it-all-together",
- "href": "blogs/posts/2024-08-08-map-and-nest/index.html#putting-it-all-together",
- "title": "Map and Nest",
- "section": "Putting it all together",
- "text": "Putting it all together\nAll of these mutate() steps can actually be called in one step. Here’s the full workflow again in full after a little refactor. I’ve also used pivot_longer() to move the two plotting columns into a single plot column. This will make it easier for me to generate nice filenames, and save the plots.\n\nresults <-\nstaff_group |>\n tidyr::nest(raw_data = -ics_name) |>\n dplyr::mutate(\n staff_percent = purrr::map(raw_data, convert_percent),\n barchart = purrr::map(staff_percent, plot_barchart),\n waffle = purrr::map2(raw_data, ics_name, \\(data, title) plot_waffle(data, title)) \n ) |>\n tidyr::pivot_longer(cols = c(barchart, waffle), names_to = \"plot_type\", values_to = \"plot\") |>\n dplyr::mutate(filename = glue::glue(\"{snakecase::to_snake_case(ics_name)}_{plot_type}.png\"))\n\nThe walk() family of functions in {purrr} are used when the function you’re applying does not return an object, but is being used for it’s side-effect, for example reading or writing files.\nHere we call walk2(), passing in both the filename column and the plots column are arguments to save all the plots.\n\npurrr::walk2(\n results$filename,\n results$plot,\n \\(filename, plot) ggplot2::ggsave(file.path(\"plots\", filename), plot, width = 10, height = 6)\n)\n\nBy keeping everything together in one nested structure, I personally find it much easier to keep track of my analyses. If you’re doing a more complex or permenant analysis, you might want to consider setting up a more formal data processing pipeline, and following RAP principals."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1",
+ "title": "Unit testing in R",
+ "section": "Split the logic into smaller functions",
+ "text": "Split the logic into smaller functions\nFunction to get the relevant conditions\n\nget_conditions <- function(type) {\n read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n}"
},
{
- "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html#footnotes",
- "href": "blogs/posts/2024-08-08-map-and-nest/index.html#footnotes",
- "title": "Map and Nest",
- "section": "Footnotes",
- "text": "Footnotes\n\n\nIn this example, we actually didn’t need to nest first. We could have performed the mutate() step on the full dataset.↩︎\nThis totally blew my mind the first time I saw it 🤯.↩︎\nWe’re mapping the relationship between the two inputs and the plot_waffle() with an anonymous function. This shorthand syntax for anonymous functions came in R v 4.1.0. For compatibility with older versions of R, you’ll need the ~ operator. For the different ways you can specify functions in {purrr} see the help file.↩︎"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2",
+ "title": "Unit testing in R",
+ "section": "Split the logic into smaller functions",
+ "text": "Split the logic into smaller functions\nFunction to combine the data and create a count by date\n\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}"
},
{
- "objectID": "about.html",
- "href": "about.html",
- "title": "About",
- "section": "",
- "text": "The Data Science team at the Strategy Unit comprises the following team members:\n\nChris Beeley\nMatt Dray\nOzayr Mohammed\nRhian Davies\nTom Jemmett\nYiWen Hon\n\nCurrent and previous projects of note include:\n\nWork supporting the New Hospitals Programme, including building a model for predicting the demand and capacity requirements of hospitals in the future, and a tool for mapping the evidence on this topic.\nThe Patient Experience Qualitative Data Categorisation project\nWork supporting the wider analytical community, through events/communities such as NHS-R and HACA."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3",
+ "title": "Unit testing in R",
+ "section": "Split the logic into smaller functions",
+ "text": "Split the logic into smaller functions\nFunction to generate a plot from the summarised data\n\ncreate_plot <- function(df) {\n df |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}"
},
{
- "objectID": "blogs/index.html",
- "href": "blogs/index.html",
- "title": "Data Science Blog",
- "section": "",
- "text": "Map and Nest\n\n\n\n\n\n\npurrr\n\n\nR\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nAug 8, 2024\n\n\nRhian Davies\n\n\n\n\n\n\n\n\n\n\n\n\nStoring data safely\n\n\n\n\n\n\nlearning\n\n\nR\n\n\nPython\n\n\n\n\n\n\n\n\n\nMay 22, 2024\n\n\nYiWen Hon, Matt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nOne year of coffee & coding\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nMay 13, 2024\n\n\nRhian Davies\n\n\n\n\n\n\n\n\n\n\n\n\nRStudio Tips and Tricks\n\n\n\n\n\n\nlearning\n\n\nR\n\n\n\n\n\n\n\n\n\nMar 21, 2024\n\n\nMatt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nVisualising participant recruitment in R using Sankey plots\n\n\n\n\n\n\nlearning\n\n\ntutorial\n\n\nvisualisation\n\n\nR\n\n\n\n\n\n\n\n\n\nFeb 28, 2024\n\n\nCraig Parylo\n\n\n\n\n\n\n\n\n\n\n\n\nNearest neighbour imputation\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 17, 2024\n\n\nJacqueline Grout\n\n\n\n\n\n\n\n\n\n\n\n\nAdvent of Code and Test Driven Development\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 10, 2024\n\n\nYiWen Hon\n\n\n\n\n\n\n\n\n\n\n\n\nReinstalling R Packages\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nAlternative remote repositories\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nCreating a hotfix with git\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nMar 24, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\nNo matching items"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4",
+ "title": "Unit testing in R",
+ "section": "Split the logic into smaller functions",
+ "text": "Split the logic into smaller functions\nThe original function refactored to use the new functions\n\nmy_big_function <- function(type) {\n conditions <- get_conditions(type)\n\n get_data_from_sql() |>\n summarise_data(conditions) |>\n create_plot()\n}\n\n\nThis is going to be significantly easier to test, because we now can verify that the individual components work correctly, rather than having to consider all of the possibilities at once."
},
{
- "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html",
- "href": "blogs/posts/2023-03-24_hotfix-with-git.html",
- "title": "Creating a hotfix with git",
- "section": "",
- "text": "I recently discovered a bug in a code-base which needed to be fixed and deployed back to production A.S.A.P., but since the last release the code has moved on significantly. The history looks something a bit like:\nThat is, we have a tag which is the code that is currently in production (which we need to patch), a number of commits after that tag to main (which were separate branches merged via pull requests), and a current development branch.\nI need to somehow: 1. go back to the tagged release, 2. check that code out, 3. patch that code, 4. commit this change, but insert the commit before all of the new commits after the tag\nThere are at least two ways that I know to do this, one would be with an interactive rebase, but I used a slightly longer method, but one I feel is a little less likely to get wrong.\nBelow are the step’s that I took. One thing I should note is this worked well for my particular issue because the change didn’t cause any merge conflicts later on."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}"
},
{
- "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase",
- "href": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase",
- "title": "Creating a hotfix with git",
- "section": "Fixing my codebase",
- "text": "Fixing my codebase\nFirst, we need to checkout the tag\ngit checkout -b hotfix v0.2.0\nThis creates a new branch called hotfix off of the tag v0.2.0.\nNow that I have the code base checked out at the point I need to fix, I can make the change that is needed, and commit the change\ngit add [FILENAME]\ngit commit -m \"fixes the code\"\n(Obviously, I used the actual file name and gave a better commit message. I Promise 😝)\nNow my code is fixed, I create a new tag for this “release”, as well as push the code to production (this step is omitted here)\ngit tag v0.2.1 -m \"version 0.2.0\"\nAt this point, our history looks something like\n\n\n\n\n\n\n\n\n\nWhat we want to do is break the link between main and v0.2.0, instead attaching tov0.2.1. First though, I want to make sure that if I make a mistake, I’m not making it on the main branch.\ngit checkout main\ngit checkout -b apply-hotfix\nThen we can fix our history using the rebase command\ngit rebase hotfix\nWhat this does is it rolls back to the point where the branch that we are rebasing (apply-hotfix) and the hotfix branch both share a common commit (v0.2.0 tag). It then applies the commits in the hotfix branch, before reapplying the commits from apply-hotfix (a.k.a. the main branch).\nOne thing to note, if you have any merge conflicts created by your fix, then the rebase will stop and ask you to fix the merge conflicts. There is some information in the GitHub doc’s for [resolving merge conflicts after a Git rebase][2].\n[2]: https://docs.github.com/en/get-started/using-git/resolving-merge-conflicts-after-a-git-rebase\nAt this point, we can check that the commit history looks correct\ngit log v0.2.0..HEAD\nIf we are happy, then we can apply this to the main branch. I do this by renaming the apply-hotfix branch as main. First, you have to delete the main branch to allow us to rename the branch.\ngit branch -D main\ngit branch -m main\nWe also need to update the other branches to use the new main branch\ngit checkout branch\ngit rebase main\nNow, we should have a history like"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\ntest_that(\"it summarises the data\", {\n # arrange\n \n\n\n\n\n\n\n \n\n \n # act\n \n # assert\n \n})"
},
{
- "objectID": "blogs/posts/2024-05-13_one-year-coffee-code.html",
- "href": "blogs/posts/2024-05-13_one-year-coffee-code.html",
- "title": "One year of coffee & coding",
- "section": "",
- "text": "The data science team have been running coffee & coding sessions for just over a year now. When I joined that Strategy Unit, I was really pleased to see these sessions running as I think making time to discuss and share technical knowledge is highly valuable, especially as an organisation grows.\nCoffee and coding sessions run every two weeks and usually take the form of a short presentation, followed by a discussion. Although we have had a variety of different sessions including live coding demos and show and tell for projects.\nWe figured it would be a good idea to do a quick survey of attendees to make sure that the sessions were beneficial and see if there were any suggestions for future sessions. We had 11 responses, all of which were really positive, with 90% agreeing that the sessions are interesting, and over 80% saying that they learn new things. Respondents said that the sessions were well varied across the technical spectrum and that they “almost always learn something useful”.\nThe two main themes of the results were that sessions were inclusive and sparked collaboration. ✨\n\nI like that everyone can contribute\n\n\nIt’s great seeing what else people are doing\n\n\nI get more ideas for future projects\n\nSome of the main suggestions included more content for newer programmers and encouraging the wider analytical team to share real project examples.\nSo with that, why not consider presenting? The sessions are informal and everyone is welcome to contribute. If you’ve got something to share, please let a member of the data science team know.\nAs a reminder, materials for our previous sessions are available under Presentations."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n \n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nGenerate some random data to build a reasonably sized data frame.\nYou could also create a table manually, but part of the trick of writing good tests for this function is to make it so the dates don’t all have the same count.\nThe reason for this is it’s harder to know for sure that the count worked if every row returns the same value.\nWe don’t need the values to be exactly like they are in the real data, just close enough. Instead of dates, we can use numbers, and instead of actual conditions, we can use letters."
},
{
- "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html",
- "href": "blogs/posts/2024-05-22-storing-data-safely/index.html",
- "title": "Storing data safely",
- "section": "",
- "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nTests need to be reproducible, and generating our table at random will give us unpredictable results.\nSo, we need to set the random seed; now every time this test runs we will generate the same data."
},
{
- "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding",
- "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding",
- "title": "Storing data safely",
- "section": "",
- "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n\n\n\n # act\n \n # assert\n \n})\n\nCreate the conditions table. We don’t need all of the columns that are present in the real csv, just the ones that will make our code work.\nWe also need to test that the filtering join (semi_join) is working, so we want to use a subset of the conditions that were used in df."
},
{
- "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins",
- "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins",
- "title": "Storing data safely",
- "section": "Posit Connect Pins",
- "text": "Posit Connect Pins\n\n# A brief intro to using {pins} to store, version, share and protect a dataset\n# on Posit Connect. Documentation: https://pins.rstudio.com/\n\n\n# Setup -------------------------------------------------------------------\n\n\ninstall.packages(c(\"pins\",\"dplyr\")) # if not yet installed\n\nsuppressPackageStartupMessages({\n library(pins)\n library(dplyr) # for wrangling and the 'starwars' demo dataset\n})\n\nboard <- board_connect() # will error if you haven't authenticated before\n# Error in `check_auth()`: ! auth = `auto` has failed to find a way to authenticate:\n# • `server` and `key` not provided for `auth = 'manual'`\n# • Can't find CONNECT_SERVER and CONNECT_API_KEY envvars for `auth = 'envvar'`\n# • rsconnect package not installed for `auth = 'rsconnect'`\n# Run `rlang::last_trace()` to see where the error occurred.\n\n# To authenticate\n# In RStudio: Tools > Global Options > Publishing > Connect... > Posit Connect\n# public URL of the Strategy Unit Posit Connect Server: connect.strategyunitwm.nhs.uk\n# Your browser will open to the Posit Connect web page and you're prompted to\n# for your password. Enter it and you'll be authenticated.\n\n# Once authenticated\nboard <- board_connect()\n# Connecting to Posit Connect 2024.03.0 at\n# <https://connect.strategyunitwm.nhs.uk>\n\nboard |> pin_list() # see all the pins on that board\n\n\n# Create a pin ------------------------------------------------------------\n\n\n# Write a dataset to the board as a pin\nboard |> pin_write(\n x = starwars,\n name = \"starwars_demo\"\n)\n# Guessing `type = 'rds'`\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_exists(\"starwars_demo\")\n# ! Use a fully specified name including user name: \"matt.dray/starwars_demo\",\n# not \"starwars_demo\".\n# [1] TRUE\n\npin_name <- \"matt.dray/starwars_demo\"\n\nboard |> pin_exists(pin_name) # logical, TRUE/FALSE\nboard |> pin_meta(pin_name) # metadata, see also 'metadata' arg in pin_write()\nboard |> pin_browse(pin_name) # view the pin in the browser\n\n\n# Permissions -------------------------------------------------------------\n\n\n# You can let people see and edit a pin. Log into Posit Connect and select the\n# pin under 'Content'. In the 'Settings' panel on the right-hand side, adjust\n# the 'sharing' options in the 'Access' tab.\n\n\n# Overwrite and version ---------------------------------------------------\n\n\nstarwars_droids <- starwars |>\n filter(species == \"Droid\") # beep boop\n\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_versions(pin_name) # see version history\nboard |> pin_versions_prune(pin_name, n = 1) # remove history\nboard |> pin_versions(pin_name)\n\n# What if you try to overwrite the data but it hasn't changed?\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# ! The hash of pin \"matt.dray/starwars_demo\" has not changed.\n# • Your pin will not be stored.\n\n\n# Use the pin -------------------------------------------------------------\n\n\n# You can read a pin to your local machine, or access it from a Quarto file\n# or Shiny app hosted on Connect, for example. If the output and the pin are\n# both on Connect, no authentication is required; the board is defaulted to\n# the Posit Connect instance where they're both hosted.\n\nboard |>\n pin_read(pin_name) |> # like you would use e.g. read_csv\n with(data = _, plot(mass, height)) # wow!\n\n\n# Delete pin --------------------------------------------------------------\n\n\nboard |> pin_exists(pin_name) # logical, good function for error handling\nboard |> pin_delete(pin_name)\nboard |> pin_exists(pin_name)"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n \n\n \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nBecause we are generating df randomly, to figure out what our “expected” results are, I simply ran the code inside of the test to generate the “actual” results.\nGenerally, this isn’t a good idea. You are creating the results of your test from the code; ideally, you want to be thinking about what the results of your function should be.\nImagine your function doesn’t work as intended, there is some subtle bug that you are not yet aware of. By writing tests “backwards” you may write test cases that confirm the results, but not expose the bug. This is why it’s good to think about edge cases."
},
{
- "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r",
- "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r",
- "title": "Storing data safely",
- "section": "Azure Storage in R",
- "text": "Azure Storage in R\nYou will need an .Renviron file with the four environment variables listed below for the code to work. This .Renviron file should be ignored by git. You can share the contents of .Renviron files with other team members via Teams, email, or Sharepoint.\nBelow is a sample .Renviron file\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nAZ_STORAGE_CONTAINER=container-name\nAZ_TENANT_ID=long-sequence-of-numbers-and-letters\nAZ_APP_ID=another-long-sequence-of-numbers-and-letters\n\ninstall.packages(c(\"AzureAuth\",\"AzureStor\", \"arrow\")) # if not yet installed\n\n# Load all environment variables\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")\n\n# Authenticate\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\n\n# If you have not authenticated before, you will be taken to an external page to\n# authenticate!Use your mlcsu.nhs.uk account.\n\n# Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\n\n# If you get a 403 error when trying to interact with the container, you may \n# have to clear your Azure token and re-authenticate using a different browser.\n# Use AzureAuth::clean_token_directory() to clear your token, then repeat the\n# AzureAuth::get_azure_token() step above.\n\n# Upload specific file to container\nAzureStor::storage_upload(container, \"data/ronald.jpeg\", \"newdir/ronald.jpeg\")\n\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(container, \"data/*\", \"newdir\")\n\n# Check files have uploaded\nblob_list <- AzureStor::list_blobs(container)\n\n# Load file directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by temporarily downloading file \n# and storing it in memory)\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\nparq_df <- arrow::read_parquet(parquet_in_memory)\n\n# Delete from Azure container (!!!)\nfor (blobfile in blob_list$name) {\n AzureStor::delete_storage_file(container, blobfile)\n}"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n ) \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nThat said, in cases where we can be confident (say by static analysis of our code) that it is correct, building tests in this way will give us the confidence going forwards that future changes do not break existing functionality.\nIn this case, I have created the expected data frame using the results from running the function."
},
{
- "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python",
- "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python",
- "title": "Storing data safely",
- "section": "Azure Storage in Python",
- "text": "Azure Storage in Python\nThis will use the same environment variables as the R version, just stored in a .env file instead.\nWe didn’t cover this in the presentation, so it’s not in the slides, but the code should be self-explanatory.\n\n\nimport os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7",
+ "title": "Unit testing in R",
+ "section": "Let’s test summarise_data",
+ "text": "Let’s test summarise_data\n\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\"))\n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n )\n # act\n actual <- summarise_data(df, conditions)\n # assert\n expect_equal(actual, expected)\n})\n\nTest passed 😸\n\n\n\nThe test works!"
},
{
- "objectID": "blogs/posts/2024-01-17_nearest_neighbour.html",
- "href": "blogs/posts/2024-01-17_nearest_neighbour.html",
- "title": "Nearest neighbour imputation",
- "section": "",
- "text": "Recently I have been gathering data by GP practice, from a variety of different sources. The ultimate purpose of my project is to be able to report at an ICB/sub-ICB level1. The various datasets cover different timescales and consequently changes in GP practices over time have left me with mismatching datasets.\n1 An ICB (Integrated Care Board) is a statutory NHS organisation responsible for planning health services for their local populationsMy approach has been to take as the basis of my project a recent GP List. Later in my project I want to perform calculations at a GP practice level based on an underlying health need and the data for this need is a CHD prevalence value from a dataset that is around 8 years old, and for which there is no update or alternative. From my recent list of 6454 practices, when I match to the need dataset, I am left with 151 practices without a value for need. If I remove these practices from the analysis then this could impact the analysis by sub-ICB since often a group of practices in the same area could be subject to changes, mergers and reorganisation.\nHere’s the packages and some demo objects to work with to create an example for two practices:\n\n\nCode\n# Packages\nlibrary(tidyverse)\nlibrary(sf)\nlibrary(tidygeocoder)\nlibrary(leaflet)\nlibrary(viridisLite)\nlibrary(gt)\n\n# Create some data with two practices with no need data \n# and a selection of practices locally with need data\npractices <- tribble(\n ~practice_code, ~postcode, ~has_orig_need, ~value,\n \"P1\",\"CV1 4FS\", 0, NA,\n \"P2\",\"CV1 3GB\", 1, 7.3,\n \"P3\",\"CV11 5TW\", 1, 6.9,\n \"P4\",\"CV6 3HZ\", 1, 7.1,\n \"P5\",\"CV6 1HS\", 1, 7.7,\n \"P6\",\"CV6 5DF\", 1, 8.2,\n \"P7\",\"CV6 3FA\", 1, 7.9,\n \"P8\",\"CV1 2DL\", 1, 7.5,\n \"P9\",\"CV1 4JH\", 1, 7.7,\n \"P10\",\"CV10 0GQ\", 1, 7.5,\n \"P11\",\"CV10 0JH\", 1, 7.8,\n \"P12\",\"CV11 5QT\", 0, NA,\n \"P13\",\"CV11 6AB\", 1, 7.6,\n \"P14\",\"CV6 4DD\", 1,7.9\n) \n\n# get domain of numeric data\n(domain <- range(practices$has_orig_need))\n\n# make a colour palette\npal <- colorNumeric(palette = viridis(2), domain = domain)\n\n\nTo provide a suitable estimate of need for the newer practices without values, all the practices in the dataset were geocoded2 using the geocode function from the {tidygeocoder} package.\n2 Geocoding is the process of converting addresses (often the postcode) into geographic coordinates (such as latitude and longitude) that can be plotted on a map.\npractices <- practices |>\n mutate(id = row_number()) |>\n geocode(postalcode = postcode) |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\n\nCode\npractices |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nhas_orig_need\nvalue\nid\ngeometry\n\n\n\n\nP1\nCV1 4FS\n0\nNA\n1\nc(-1.50686326666667, 52.4141089666667)\n\n\nP2\nCV1 3GB\n1\n7.3\n2\nc(-1.51888, 52.4034199)\n\n\nP3\nCV11 5TW\n1\n6.9\n3\nc(-1.46746, 52.519)\n\n\nP4\nCV6 3HZ\n1\n7.1\n4\nc(-1.52231, 52.42367)\n\n\nP5\nCV6 1HS\n1\n7.7\n5\nc(-1.52542, 52.41989)\n\n\nP6\nCV6 5DF\n1\n8.2\n6\nc(-1.498344825, 52.4250186)\n\n\nP7\nCV6 3FA\n1\n7.9\n7\nc(-1.51787, 52.43135)\n\n\nP8\nCV1 2DL\n1\n7.5\n8\nc(-1.49105, 52.40582)\n\n\nP9\nCV1 4JH\n1\n7.7\n9\nc(-1.50653, 52.41953)\n\n\nP10\nCV10 0GQ\n1\n7.5\n10\nc(-1.52197, 52.54074)\n\n\nP11\nCV10 0JH\n1\n7.8\n11\nc(-1.5163199, 52.53723)\n\n\nP12\nCV11 5QT\n0\nNA\n12\nc(-1.46927, 52.51899)\n\n\nP13\nCV11 6AB\n1\n7.6\n13\nc(-1.45822, 52.52682)\n\n\nP14\nCV6 4DD\n1\n7.9\n14\nc(-1.50832, 52.44104)\n\n\n\n\n\n\n\nThis map shows the practices, purple are the practices with no need data and yellow are practices with need data available.\n\n\nCode\n# make map to display practices\nleaflet(practices) |> \n addTiles() |>\n addCircleMarkers(color = ~pal(has_orig_need)) \n\n\n\n\n\n\nThe data was split into those with, and without, a value for need. Using st_join from the {sf} package to join those without, and those with, a value for need, using the geometry to find all those within 1500m (1.5km).\n\nno_need <- practices |>\n filter(has_orig_need == 0)\n\nwith_need <- practices |>\n filter(has_orig_need == 1)\n\n\nneighbours <- no_need |>\n select(no_need_postcode = postcode,no_need_prac_code=practice_code) |>\n st_join(with_need, st_is_within_distance, 1500) |>\n st_drop_geometry() |>\n select(id, no_need_postcode,no_need_prac_code) |>\n inner_join(x = with_need, by = join_by(\"id\")) \n\n\n\nCode\nleaflet(neighbours) |> \n addTiles() |>\n addCircleMarkers(color = \"purple\") |>\n addMarkers( -1.50686326666667, 52.4141089666667, popup = \"Practice with no data\"\n) |>\n addCircles(-1.50686326666667, 52.4141089666667,radius=1500) |>\n addMarkers(-1.46927, 52.51899, popup = \"Practice with no data\"\n) |>\naddCircles(-1.46927, 52.51899,radius=1500)\n\n\n\n\n\n\nThe data for the “neighbours” was grouped by the practice code of those without need data and a mean value was calculated for each practice to generate an estimated value.\n\nneighbours_estimate <- neighbours |>\n group_by(no_need_prac_code) |>\n summarise(need_est=mean(value)) |>\n st_drop_geometry(select(no_need_prac_code,need_est)) \n\nThe original data was joined back to the “neighbours”.\n\n practices_with_neighbours_estimate <- practices |>\n left_join(neighbours_estimate, join_by(practice_code==no_need_prac_code)) |>\n st_drop_geometry(select(practice_code,need_est))\n\n\n\nCode\n practices_with_neighbours_estimate |>\n select(-has_orig_need,-id) |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nvalue\nneed_est\n\n\n\n\nP1\nCV1 4FS\nNA\n7.583333\n\n\nP2\nCV1 3GB\n7.3\nNA\n\n\nP3\nCV11 5TW\n6.9\nNA\n\n\nP4\nCV6 3HZ\n7.1\nNA\n\n\nP5\nCV6 1HS\n7.7\nNA\n\n\nP6\nCV6 5DF\n8.2\nNA\n\n\nP7\nCV6 3FA\n7.9\nNA\n\n\nP8\nCV1 2DL\n7.5\nNA\n\n\nP9\nCV1 4JH\n7.7\nNA\n\n\nP10\nCV10 0GQ\n7.5\nNA\n\n\nP11\nCV10 0JH\n7.8\nNA\n\n\nP12\nCV11 5QT\nNA\n7.250000\n\n\nP13\nCV11 6AB\n7.6\nNA\n\n\nP14\nCV6 4DD\n7.9\nNA\n\n\n\n\n\n\n\nFinally, an updated data frame was created of the need data using the actual need for the practice where available, otherwise using estimated need.\n\npractices_with_neighbours_estimate <- practices_with_neighbours_estimate |>\n mutate(need_to_use = case_when(value>=0 ~ value,\n .default = need_est)) |>\n select(practice_code,need_to_use) \n\n\n\n\n\n\n\n\n\npractice_code\nneed_to_use\n\n\n\n\nP1\n7.583333\n\n\nP2\n7.300000\n\n\nP3\n6.900000\n\n\nP4\n7.100000\n\n\nP5\n7.700000\n\n\nP6\n8.200000\n\n\nP7\n7.900000\n\n\nP8\n7.500000\n\n\nP9\n7.700000\n\n\nP10\n7.500000\n\n\nP11\n7.800000\n\n\nP12\n7.250000\n\n\nP13\n7.600000\n\n\nP14\n7.900000\n\n\n\n\n\n\n\nFor my project, this method has successfully generated a prevalence for 125 of the 151 practices without a need value, leaving just 26 practices without a need. This is using a 1.5 km radius. In each use case there will be a decision to make regarding a more accurate estimate (smaller radius) and therefore fewer matches versus a less accurate estimate (using a larger radius) and therefore more matches.\nThis approach could be replicated for other similar uses/purposes. A topical example from an SU project is the need to assign population prevalence for hypertension and compare it to current QOF3 data. Again, the prevalence data is a few years old so we have to move the historical data to fit with current practices and this leaves missing data that can be estimated using this method.\n\n\n3 QOF (Quality and Outcomes Framework) is a voluntary annual reward and incentive programme for all GP practices in England, detailing practice achievement results."
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps",
+ "title": "Unit testing in R",
+ "section": "Next steps",
+ "text": "Next steps\n\nYou can add tests to any R project (to test functions),\nBut {testthat} works best with Packages\nThe R Packages book has 3 chapters on testing\nThere are two useful helper functions in {usethis}\n\nuse_testthat() will set up the folders for test scripts\nuse_test() will create a test file for the currently open script"
},
{
- "objectID": "blogs/posts/2024-02-28_sankey_plot.html",
- "href": "blogs/posts/2024-02-28_sankey_plot.html",
- "title": "Visualising participant recruitment in R using Sankey plots",
- "section": "",
- "text": "Sankey diagrams are great tools to visualise flows through a system. They show connections between the steps of a process where the width of the arrows is proportional to the flow.\nI’m working on an evaluation of a risk screening process for people aged between 55-74 years and a history of smoking. In this Targeted Lung Health Check (TLHC) programme1 eligible people are invited to attend a free lung check where those assessed at high risk of lung cancer are then offered low-dose CT screening scans.\n1 Please visit the NHS England site for for more background.We used Sankey diagrams to visualise how people have engaged with the programme, from recruitment, attendance at appointments, their outcome from risk assessment, attendance at CT scans and will eventually be extended to cover the impact of the screening on early detection of those diagnosed with lung cancer.\nThis blog post is about the technical process of preparing record-level data for visualisation in a Sankey plot using R and customising it to enhance look and feel. Here is how the finished product will look:"
+ "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1",
+ "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1",
+ "title": "Unit testing in R",
+ "section": "Next steps",
+ "text": "Next steps\n\nIf your test needs to temporarily create a file, or change some R-options, the {withr} package has a lot of useful functions that will automatically clean things up when the test finishes\nIf you are writing tests that involve calling out to a database, or you want to test my_big_function (from before) without calling the intermediate functions, then you should look at the {mockery} package"
},
{
- "objectID": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data",
- "href": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data",
- "title": "Visualising participant recruitment in R using Sankey plots",
- "section": "Get the data",
- "text": "Get the data\nIn this example we will work with a simplified set of data focused on invitations.\nThe invites table holds details of when people were sent a letter or message inviting them to take part, how many times they were invited and how the person responded.\nThe people eligible for the programme are identified up-front and are represented by a unique ID with one row per person. Let’s assume each person receives at least one invitation to take part, they can have one of three outcomes:\n\nThey accept the invitation and agree to take part,\nThey decline the invitation,\nThey do not respond to the invitation.\n\nIf the person doesn’t respond to the first invitation they may be sent a second invitation and could be offered a third invitation if they didn’t respond to the second.\nHere is the specification for our simplified invites table:\n\nInvites specification\n\n\n\n\n\n\n\nField\nType\nDescription\n\n\n\n\nParticipant ID\nInteger\nA unique identifier for each person.\n\n\nInvite date 1\nDate\nThe date the person was first invited to participate.\nEvery person will have a date in this field.\n\n\nInvite date 2\nDate\nThe date a second invitation was sent.\n\n\nInvite date 3\nDate\nThe date a third invitation was sent.\n\n\nInvite outcome\nText\nThe outcome from the invite, one of either ‘Accepted’, ‘Declined’ or ‘No response’.\n\n\n\nEveryone receives at least one invite. Assuming a third of these respond (to either accept or decline) then two-thirds receive a follow-up invite. Of these, we assume half respond, meaning the remaining participants receive a third invite.\nHere we generate 100 rows of example data to populate our table.\n\n\nCode\n# set a randomisation seed for reproducibility\nset.seed(seed = 1234)\n\n# define some parameters\nstart_date = as.Date('2019-01-01')\nend_date = as.Date('2021-01-01')\nrows = 100\n\ndf_invites_1 <- tibble(\n # create a unique id for each participant\n participant_id = 1:rows,\n \n # create a random initial invite date between our start and end dates\n invite_1_date = sample(\n seq(start_date, end_date, by = 'day'), \n size = rows, replace = T\n ),\n \n # create a random outcome for this participant\n invite_outcome = sample(\n x = c('Accepted', 'Declined', 'No response'),\n size = rows, replace = T\n )\n)\n\n# take a sample of participants and allocate them a second invite date\ndf_invites_2 <- df_invites_1 |>\n # sample two thirds of participants to get a second invite\n slice_sample(prop = 2/3) |> \n # allocate a date between 10 and 30 days following the first\n mutate(\n invite_2_date = invite_1_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_2_date)\n\n\n# take a sample of those with a second invite and allocate them a third invite date\ndf_invites_3 <- df_invites_2 |> \n # sample half of these to get a third invite\n slice_sample(prop = 1/2) |> \n # allocate a date between 10 to 30 days following the second\n mutate(\n invite_3_date = invite_2_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_3_date)\n\n# combine the 2nd and 3rd invites with the first table\ndf_invites <- df_invites_1 |> \n left_join(\n y = df_invites_2, \n by = 'participant_id'\n ) |> \n left_join(\n y = df_invites_3,\n by = 'participant_id'\n ) |> \n # move the outcome field after the third invite\n relocate(invite_outcome, .after = invite_3_date)\n\n# housekeeping\nrm(df_invites_1, df_invites_2, df_invites_3, start_date, end_date, rows)\n\n# view our data\ndf_invites |> \n reactable(defaultPageSize = 5)\n\n\n\n\nGenerated invite table"
+ "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read",
+ "href": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read",
+ "title": "Coffee and Coding",
+ "section": "Which is easier to read?",
+ "text": "Which is easier to read?\n\nae_attendances |>\n filter(org_code %in% c(\"RNA\", \"RL4\")) |>\n mutate(performance = 1 + breaches / attendances) |>\n filter(type == 1) |>\n mutate(met_target = performance >= 0.95)\n\nor\n\nae_attendances |>\n filter(\n org_code %in% c(\"RNA\", \"RL4\"),\n type == 1\n ) |>\n mutate(\n performance = 1 + breaches / attendances,\n met_target = performance >= 0.95\n )\n\n\n spending a few seconds to neatly format your code can greatly improve the legibility to future readers, making the intent of the code far clearer, and will make finding bugs easier to spot.\n\n\n (have you spotted the mistake in the snippets above?)"
},
{
- "objectID": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes",
- "href": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes",
- "title": "Visualising participant recruitment in R using Sankey plots",
- "section": "Determine milestone outcomes",
- "text": "Determine milestone outcomes\nThe next step is to take our source table and convert the data into a series of milestones (and associated outcomes) that represents how our invited participants moved through the pathway.\nIn our example we have five milestones to represent in our Sankey plot:\n\nOur eligible population (everyone in our invites table),\nThe result from the first invitation,\nThe result from the second invitation,\nThe result from the third invitation,\nThe overall invite outcome.\n\nAside from the eligible population, where everyone starts with the same value, participants will have one of several outcomes at each milestone. This step is about naming these milestones and the outcomes.\nIt is important that each milestone-outcome has unique values. An outcome of ‘No response’ can be recorded against the first, second and third invite, and we wish to see these outcomes separately represented on the Sankey (rather than just one ‘No response’), so each outcome must be made unique. In this example we prefix the outcome from each invite with the number of the invite, e.g. ‘Invite 1 No response’.\nThe reason for this will become clearer when we come to plot the Sankey, but for now we produce these milestone-outcomes from our invites table.\n\n\nCode\ndf_milestones <- df_invites |> \n mutate(\n # everyone starts in the eligible population\n start_population = 'Eligible population',\n \n # work out what happened following the first invite\n invite_1_outcome = case_when(\n # if a second invite was sent we assume there was no outcome from the first\n !is.na(invite_2_date) ~ 'Invitation 1 No response',\n # otherwise the overall outcome resulted from the first invite\n .default = glue('Invitation 1 {invite_outcome}')\n ),\n \n # work out what happened following the second invite\n invite_2_outcome = case_when(\n # if a third invite was sent we assume there was no outcome from the second\n !is.na(invite_3_date) ~ 'Invitation 2 No response',\n # if a second invite was sent but no third then\n !is.na(invite_2_date) ~ glue('Invitation 2 {invite_outcome}'),\n # default to NA if neither of the above are true\n .default = NA\n ),\n \n # work out what happened following the third invite\n invite_3_outcome = case_when(\n # if a third invite was sent then the outcome is the overall outcome\n !is.na(invite_3_date) ~ glue('Invitation 3 {invite_outcome}'),\n # otherwise mark as NA\n .default = NA\n )\n ) |> \n # exclude the dates as they are no longer needed\n select(-contains('_date')) |> \n # move the overall invite outcome to the end\n relocate(invite_outcome, .after = invite_3_outcome)\n\n# view our data\ndf_milestones |> \n reactable(defaultPageSize = 5)\n\n\n\n\nMilestone-outcomes for participants"
+ "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide",
+ "href": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide",
+ "title": "Coffee and Coding",
+ "section": "Tidyverse Style Guide",
+ "text": "Tidyverse Style Guide\n\nGood coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread\n\n\nAll style guides are fundamentally opinionated. Some decisions genuinely do make code easier to use (especially matching indenting to programming structure), but many decisions are arbitrary. The most important thing about a style guide is that it provides consistency, making code easier to write because you need to make fewer decisions.\n\ntidyverse style guide"
},
{
- "objectID": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows",
- "href": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows",
- "title": "Visualising participant recruitment in R using Sankey plots",
- "section": "Calculate flows",
- "text": "Calculate flows\nNext we take pairs of milestone-outcomes and calculate the number of participants that moved between them.\nHere we utilise the power of dplyr::summarise with an argument .by to group by our data before counting the number of unique participants who move between our start and end groups.\nFor invites 2 and 3 we perform two sets of summaries:\n\nThe first where the values in the to and from fields contain details.\nThe second to capture cases where the to destination is NULL. This is because the participant responded at the previous invite so there was no subsequent invite. In these cases we flow the participant to the overall invite outcome.2\n\n2 If you are thinking there is a lot of repetition here, you’re right. In practice I abstracted both steps to a function and passed in the parameters for the from and to variables and simplified my workflow a little, however, I’m showing it in plain form here for simplification.\n\nCode\ndf_flows <- bind_rows(\n \n # flow from population to invite 1\n df_milestones |> \n filter(!is.na(start_population) & !is.na(invite_1_outcome)) |> \n rename(from = start_population, to = invite_1_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to invite 2 (where not NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & !is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_2_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to overall invite outcome (where invite 2 is NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to invite 3 (where not NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & !is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_3_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to overall invite outcome (where invite 3 is NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # final flow - invite 3 to overall outcome (where both are not NA)\n df_milestones |> \n filter(!is.na(invite_3_outcome) & !is.na(invite_outcome)) |> \n rename(from = invite_3_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n )\n)\n\n# view our data\ndf_flows |> \n reactable(defaultPageSize = 5)\n\n\n\n\nFlows of participants between milestones"
+ "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends",
+ "href": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends",
+ "title": "Coffee and Coding",
+ "section": "{lintr} + {styler} are your new best friends",
+ "text": "{lintr} + {styler} are your new best friends\n\n\n{lintr}\n\n{lintr} is a static code analysis tool that inspects your code (without running it)\nit checks for certain classes of errors (e.g. mismatched { and (’s)\nit warns about potential issues (e.g. using variables that aren’t defined)\nit warns about places where you are not adhering to the code style\n\n\n{styler}\n\n{styler} is an RStudio add in that automatically reformats your code, tidying it up to match the style guide\n99.9% of the time it will give you equivalent code, but there is the potential that it may change the behaviour of your code\nit will overwrite the files that you ask it to run on however, so it is vital to be using version control\na good workflow here is to save your file, “stage” the changes to your file, then run {styler}. You can then revert back to the staged changed if needed."
},
{
- "objectID": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly",
- "href": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly",
- "title": "Visualising participant recruitment in R using Sankey plots",
- "section": "Preparing for plotly",
- "text": "Preparing for plotly\nPlotly expects to be fed two sets of data:\n\nNodes - these are the milestones we have in our from and to fields,\nEdges - these are the flows that occur between nodes, the flow in our table.\n\nIt is possible to extract this data by hand but I found using the tidygraph package was much easier and more convenient.\n\ndf_sankey <- df_flows |> \n # convert our flows data to a tidy graph object\n as_tbl_graph()\n\nThe tidygraph package splits our data into nodes and edges. We can selectively work on each by ‘activating’ them - here is the nodes list:\n\ndf_sankey |> \n activate(what = 'nodes') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nYou can see each unique node name listed. The row numbers for these nodes are used as reference IDs in the edges object:\n\ndf_sankey |> \n activate(what = 'edges') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nWe now have enough information to generate our Sankey.\nFirst we extract our nodes and edges to separate data frames then convert the ID values to be zero-based (starts at 0) as this is what plotly is expecting. To do this is as simple as subtracting 1 from the value of the IDs.\nFinally we pass these two dataframes to plotly’s node and link function inputs to generate the plot.\n\n\nCode\n# extract the nodes to a dataframe\nnodes <- df_sankey |> \n activate(nodes) |> \n data.frame() |> \n mutate(\n id = row_number() -1\n )\n\n# extract the edges to a dataframe\nedges <- df_sankey |> \n activate(edges) |> \n data.frame() |> \n mutate(\n from = from - 1,\n to = to - 1\n )\n\n# plot our sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name\n ),\n \n # use our link data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow\n )\n)\n\n\n\n\nOur first sankey\n\n\nNot bad!\nWe can see the structure of our Sankey now. Can you see the relative proportions of participants who did or didn’t respond to our first invite? Marvel at how those who responded to the first invite flow into our final outcome. How about those who didn’t respond to the first invitation go on to receive a second invite?\nPlotly’s charts are interactive. Try hovering your cursor over the nodes and edges to highlight them and a pop-up box will appear giving you additional details. You can reorder the vertical position of the nodes by dragging them above or below an adjacent node.\nThis looks functional."
+ "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like",
+ "href": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like",
+ "title": "Coffee and Coding",
+ "section": "What does {lintr} look like?",
+ "text": "What does {lintr} look like?\n\n\n\nsource: Good practice for writing R code and R packages\n\nrunning lintr can be done in the console, e.g.\n\nlintr::lintr_dir(\".\")\n\nor via the Addins menu"
},
{
- "objectID": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey",
- "href": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey",
- "title": "Visualising participant recruitment in R using Sankey plots",
- "section": "Styling our Sankey",
- "text": "Styling our Sankey\nNow we have the foundations of our Sankey I’d like to move on to its presentation. Specifically I’d like to:\n\nuse colour coding to clearly group those who accept or decline the invite,\nimprove the readability of the node titles,\nadd additional information to the pop-up boxes when you hover over nodes and edges, and\ncontrol the positioning of the nodes in the plot.\n\nAs our nodes and edges objects are dataframes it is straightforward to add this styling information directly to them.\nFor the nodes object we define colours based on the name of each node and manually position them in the plot\n\n\nCode\n# get the eligible population as a single value\n# NB, will be used to work out % amounts in each node and edge\ntemp_eligible_pop <- df_flows |> \n filter(from == 'Eligible population') |> \n summarise(total = sum(flow, na.rm = T)) |> \n pull(total)\n\n# style our nodes object\nnodes <- nodes |> \n mutate(\n # colour ----\n # add colour definitions, green for accepted, red for declined\n colour = case_when(\n str_detect(name, 'Accepted') ~ '#44bd32',\n str_detect(name, 'Declined') ~ '#c23616',\n str_detect(name, 'No response') ~ '#7f8fa6',\n str_detect(name, 'Eligible population') ~ '#7f8fa6'\n ),\n \n # add a semi-transparent colour for the edges based on node colours\n colour_fade = col2hcl(colour = colour, alpha = 0.3),\n \n # positioning ----\n # NB, I found that to position nodes you need to supply both\n # horizontal and vertical positions\n # NNB, it was a bit of trial and error to get the these positions just\n # right\n \n # horizontal positions (0 = left, 1 = right)\n x = case_when(\n str_detect(name, 'Eligible population') ~ 1,\n str_detect(name, 'Invitation 1') ~ 2,\n str_detect(name, 'Invitation 2') ~ 3,\n str_detect(name, 'Invitation 3') ~ 4,\n .default = 5\n ) |> rescale(to = c(0.001, 0.9)),\n \n # vertical position (1 = bottom, 0 = top)\n y = case_when(\n str_detect(name, 'Eligible population') ~ 5,\n # invite 1\n str_detect(name, 'Invitation 1 Accepted') ~ 1,\n str_detect(name, 'Invitation 1 No response') ~ 5,\n str_detect(name, 'Invitation 1 Declined') ~ 8.5,\n # invite 2\n str_detect(name, 'Invitation 2 Accepted') ~ 2,\n str_detect(name, 'Invitation 2 No response') ~ 5,\n str_detect(name, 'Invitation 2 Declined') ~ 7.8,\n # invite 3\n str_detect(name, 'Invitation 3 Accepted') ~ 2.7,\n str_detect(name, 'Invitation 3 No response') ~ 5.8,\n str_detect(name, 'Invitation 3 Declined') ~ 7.2,\n # final outcomes\n str_detect(name, 'Accepted') ~ 1,\n str_detect(name, 'No response') ~ 5,\n str_detect(name, 'Declined') ~ 8,\n .default = 5\n ) |> rescale(to = c(0.001, 0.999))\n ) |> \n # add in a custom field to show the percentage flow\n left_join(\n y = df_flows |> \n group_by(to) |> \n summarise(\n flow = sum(flow, na.rm = T),\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1),\n ) |> \n select(name = to, flow_perc),\n by = 'name'\n )\n\n# view our nodes data\nnodes |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the nodes dataframe\n\n\nNext we move to styling the edges, which is a much simpler prospect:\n\n\nCode\nedges <- edges |> \n mutate(\n # add a label for each flow to tell us how many people are in each\n label = number(flow, big.mark = ','),\n # add a percentage flow figure\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1)\n ) |> \n # add the faded colour from our nodes object to match the destinations\n left_join(\n y = nodes |> select(to = id, colour_fade),\n by = 'to'\n )\n\n# view our edges data\nedges |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the edges dataframe\n\n\nWe now have stylised node and edge tables ready and can bring it all together. Note the use of customdata and hovertemplate help to bring in additional information and styling to the pop-up boxes that appear when you hover over each flow and node.\n\n\nCode\n# plot our stylised sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name,\n color = nodes$colour,\n x = nodes$x,\n y = nodes$y,\n customdata = nodes$flow_perc,\n hovertemplate = '%{label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n ),\n \n # use our edge data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow,\n label = edges$label,\n color = edges$colour_fade,\n customdata = edges$flow_perc,\n hovertemplate = '%{source.label} → %{target.label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n )\n) |> \n layout(\n font = list(\n family = 'Arial, Helvetica, sans-serif',\n size = 12\n ),\n # make the background transparent (also removes the text shadow)\n paper_bgcolor = 'rgba(0,0,0,0)'\n ) |> \n config(responsive = T)\n\n\n\n\nA stylish Sankey"
+ "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler",
+ "href": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler",
+ "title": "Coffee and Coding",
+ "section": "Using {styler}",
+ "text": "Using {styler}\n\nsource: Good practice for writing R code and R packages"
+ },
+ {
+ "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility",
+ "href": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility",
+ "title": "Coffee and Coding",
+ "section": "Further thoughts on improving code legibility",
+ "text": "Further thoughts on improving code legibility\n\ndo not let files grow too big\nbreak up logic into separate files, then you can use source(\"filename.R) to run the code in that file\nidealy, break up your logic into separate functions, each function having it’s own file, and then call those functions within your analysis\ndo not repeat yourself - if you are copying and pasting your code then you should be thinking about how to write a single function to handle this repeated logic\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
"objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#the-team",
@@ -1254,718 +1184,837 @@
"text": "Inputs App"
},
{
- "objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app",
- "href": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app",
- "title": "An Introduction to the New Hospital Programme Demand Model",
- "section": "Outputs App",
- "text": "Outputs App\nA {shiny} app that allows the user to view the results of model runs."
+ "objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app",
+ "href": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app",
+ "title": "An Introduction to the New Hospital Programme Demand Model",
+ "section": "Outputs App",
+ "text": "Outputs App\nA {shiny} app that allows the user to view the results of model runs."
+ },
+ {
+ "objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app-1",
+ "href": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app-1",
+ "title": "An Introduction to the New Hospital Programme Demand Model",
+ "section": "Outputs App",
+ "text": "Outputs App"
+ },
+ {
+ "objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#questions",
+ "href": "presentations/2023-07-11_haca-nhp-demand-model/index.html#questions",
+ "title": "An Introduction to the New Hospital Programme Demand Model",
+ "section": "Questions?",
+ "text": "Questions?\n\nContact The Strategy Unit\n\n\n strategy.unit@nhs.net\n The-Strategy-Unit\n\n\nContact Me\n\n\n thomas.jemmett@nhs.net\n tomjemmett\n\n\n\n\n\nview slides at https://tinyurl.com/haca23nhp"
+ },
+ {
+ "objectID": "blogs/posts/2024-02-28_sankey_plot.html",
+ "href": "blogs/posts/2024-02-28_sankey_plot.html",
+ "title": "Visualising participant recruitment in R using Sankey plots",
+ "section": "",
+ "text": "Sankey diagrams are great tools to visualise flows through a system. They show connections between the steps of a process where the width of the arrows is proportional to the flow.\nI’m working on an evaluation of a risk screening process for people aged between 55-74 years and a history of smoking. In this Targeted Lung Health Check (TLHC) programme1 eligible people are invited to attend a free lung check where those assessed at high risk of lung cancer are then offered low-dose CT screening scans.\n1 Please visit the NHS England site for for more background.We used Sankey diagrams to visualise how people have engaged with the programme, from recruitment, attendance at appointments, their outcome from risk assessment, attendance at CT scans and will eventually be extended to cover the impact of the screening on early detection of those diagnosed with lung cancer.\nThis blog post is about the technical process of preparing record-level data for visualisation in a Sankey plot using R and customising it to enhance look and feel. Here is how the finished product will look:"
+ },
+ {
+ "objectID": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data",
+ "href": "blogs/posts/2024-02-28_sankey_plot.html#get-the-data",
+ "title": "Visualising participant recruitment in R using Sankey plots",
+ "section": "Get the data",
+ "text": "Get the data\nIn this example we will work with a simplified set of data focused on invitations.\nThe invites table holds details of when people were sent a letter or message inviting them to take part, how many times they were invited and how the person responded.\nThe people eligible for the programme are identified up-front and are represented by a unique ID with one row per person. Let’s assume each person receives at least one invitation to take part, they can have one of three outcomes:\n\nThey accept the invitation and agree to take part,\nThey decline the invitation,\nThey do not respond to the invitation.\n\nIf the person doesn’t respond to the first invitation they may be sent a second invitation and could be offered a third invitation if they didn’t respond to the second.\nHere is the specification for our simplified invites table:\n\nInvites specification\n\n\n\n\n\n\n\nField\nType\nDescription\n\n\n\n\nParticipant ID\nInteger\nA unique identifier for each person.\n\n\nInvite date 1\nDate\nThe date the person was first invited to participate.\nEvery person will have a date in this field.\n\n\nInvite date 2\nDate\nThe date a second invitation was sent.\n\n\nInvite date 3\nDate\nThe date a third invitation was sent.\n\n\nInvite outcome\nText\nThe outcome from the invite, one of either ‘Accepted’, ‘Declined’ or ‘No response’.\n\n\n\nEveryone receives at least one invite. Assuming a third of these respond (to either accept or decline) then two-thirds receive a follow-up invite. Of these, we assume half respond, meaning the remaining participants receive a third invite.\nHere we generate 100 rows of example data to populate our table.\n\n\nCode\n# set a randomisation seed for reproducibility\nset.seed(seed = 1234)\n\n# define some parameters\nstart_date = as.Date('2019-01-01')\nend_date = as.Date('2021-01-01')\nrows = 100\n\ndf_invites_1 <- tibble(\n # create a unique id for each participant\n participant_id = 1:rows,\n \n # create a random initial invite date between our start and end dates\n invite_1_date = sample(\n seq(start_date, end_date, by = 'day'), \n size = rows, replace = T\n ),\n \n # create a random outcome for this participant\n invite_outcome = sample(\n x = c('Accepted', 'Declined', 'No response'),\n size = rows, replace = T\n )\n)\n\n# take a sample of participants and allocate them a second invite date\ndf_invites_2 <- df_invites_1 |>\n # sample two thirds of participants to get a second invite\n slice_sample(prop = 2/3) |> \n # allocate a date between 10 and 30 days following the first\n mutate(\n invite_2_date = invite_1_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_2_date)\n\n\n# take a sample of those with a second invite and allocate them a third invite date\ndf_invites_3 <- df_invites_2 |> \n # sample half of these to get a third invite\n slice_sample(prop = 1/2) |> \n # allocate a date between 10 to 30 days following the second\n mutate(\n invite_3_date = invite_2_date + sample(10:30, size = n(), replace = T)\n ) |> \n # keep just id and second date\n select(participant_id, invite_3_date)\n\n# combine the 2nd and 3rd invites with the first table\ndf_invites <- df_invites_1 |> \n left_join(\n y = df_invites_2, \n by = 'participant_id'\n ) |> \n left_join(\n y = df_invites_3,\n by = 'participant_id'\n ) |> \n # move the outcome field after the third invite\n relocate(invite_outcome, .after = invite_3_date)\n\n# housekeeping\nrm(df_invites_1, df_invites_2, df_invites_3, start_date, end_date, rows)\n\n# view our data\ndf_invites |> \n reactable(defaultPageSize = 5)\n\n\n\n\nGenerated invite table"
+ },
+ {
+ "objectID": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes",
+ "href": "blogs/posts/2024-02-28_sankey_plot.html#determine-milestone-outcomes",
+ "title": "Visualising participant recruitment in R using Sankey plots",
+ "section": "Determine milestone outcomes",
+ "text": "Determine milestone outcomes\nThe next step is to take our source table and convert the data into a series of milestones (and associated outcomes) that represents how our invited participants moved through the pathway.\nIn our example we have five milestones to represent in our Sankey plot:\n\nOur eligible population (everyone in our invites table),\nThe result from the first invitation,\nThe result from the second invitation,\nThe result from the third invitation,\nThe overall invite outcome.\n\nAside from the eligible population, where everyone starts with the same value, participants will have one of several outcomes at each milestone. This step is about naming these milestones and the outcomes.\nIt is important that each milestone-outcome has unique values. An outcome of ‘No response’ can be recorded against the first, second and third invite, and we wish to see these outcomes separately represented on the Sankey (rather than just one ‘No response’), so each outcome must be made unique. In this example we prefix the outcome from each invite with the number of the invite, e.g. ‘Invite 1 No response’.\nThe reason for this will become clearer when we come to plot the Sankey, but for now we produce these milestone-outcomes from our invites table.\n\n\nCode\ndf_milestones <- df_invites |> \n mutate(\n # everyone starts in the eligible population\n start_population = 'Eligible population',\n \n # work out what happened following the first invite\n invite_1_outcome = case_when(\n # if a second invite was sent we assume there was no outcome from the first\n !is.na(invite_2_date) ~ 'Invitation 1 No response',\n # otherwise the overall outcome resulted from the first invite\n .default = glue('Invitation 1 {invite_outcome}')\n ),\n \n # work out what happened following the second invite\n invite_2_outcome = case_when(\n # if a third invite was sent we assume there was no outcome from the second\n !is.na(invite_3_date) ~ 'Invitation 2 No response',\n # if a second invite was sent but no third then\n !is.na(invite_2_date) ~ glue('Invitation 2 {invite_outcome}'),\n # default to NA if neither of the above are true\n .default = NA\n ),\n \n # work out what happened following the third invite\n invite_3_outcome = case_when(\n # if a third invite was sent then the outcome is the overall outcome\n !is.na(invite_3_date) ~ glue('Invitation 3 {invite_outcome}'),\n # otherwise mark as NA\n .default = NA\n )\n ) |> \n # exclude the dates as they are no longer needed\n select(-contains('_date')) |> \n # move the overall invite outcome to the end\n relocate(invite_outcome, .after = invite_3_outcome)\n\n# view our data\ndf_milestones |> \n reactable(defaultPageSize = 5)\n\n\n\n\nMilestone-outcomes for participants"
+ },
+ {
+ "objectID": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows",
+ "href": "blogs/posts/2024-02-28_sankey_plot.html#calculate-flows",
+ "title": "Visualising participant recruitment in R using Sankey plots",
+ "section": "Calculate flows",
+ "text": "Calculate flows\nNext we take pairs of milestone-outcomes and calculate the number of participants that moved between them.\nHere we utilise the power of dplyr::summarise with an argument .by to group by our data before counting the number of unique participants who move between our start and end groups.\nFor invites 2 and 3 we perform two sets of summaries:\n\nThe first where the values in the to and from fields contain details.\nThe second to capture cases where the to destination is NULL. This is because the participant responded at the previous invite so there was no subsequent invite. In these cases we flow the participant to the overall invite outcome.2\n\n2 If you are thinking there is a lot of repetition here, you’re right. In practice I abstracted both steps to a function and passed in the parameters for the from and to variables and simplified my workflow a little, however, I’m showing it in plain form here for simplification.\n\nCode\ndf_flows <- bind_rows(\n \n # flow from population to invite 1\n df_milestones |> \n filter(!is.na(start_population) & !is.na(invite_1_outcome)) |> \n rename(from = start_population, to = invite_1_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to invite 2 (where not NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & !is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_2_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 1 to overall invite outcome (where invite 2 is NA)\n df_milestones |> \n filter(!is.na(invite_1_outcome) & is.na(invite_2_outcome)) |> \n rename(from = invite_1_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to invite 3 (where not NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & !is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_3_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # flow from invite 2 to overall invite outcome (where invite 3 is NA)\n df_milestones |> \n filter(!is.na(invite_2_outcome) & is.na(invite_3_outcome)) |> \n rename(from = invite_2_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n ),\n \n # final flow - invite 3 to overall outcome (where both are not NA)\n df_milestones |> \n filter(!is.na(invite_3_outcome) & !is.na(invite_outcome)) |> \n rename(from = invite_3_outcome, to = invite_outcome) |> \n summarise(\n flow = n_distinct(participant_id, na.rm = T),\n .by = c(from, to)\n )\n)\n\n# view our data\ndf_flows |> \n reactable(defaultPageSize = 5)\n\n\n\n\nFlows of participants between milestones"
+ },
+ {
+ "objectID": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly",
+ "href": "blogs/posts/2024-02-28_sankey_plot.html#preparing-for-plotly",
+ "title": "Visualising participant recruitment in R using Sankey plots",
+ "section": "Preparing for plotly",
+ "text": "Preparing for plotly\nPlotly expects to be fed two sets of data:\n\nNodes - these are the milestones we have in our from and to fields,\nEdges - these are the flows that occur between nodes, the flow in our table.\n\nIt is possible to extract this data by hand but I found using the tidygraph package was much easier and more convenient.\n\ndf_sankey <- df_flows |> \n # convert our flows data to a tidy graph object\n as_tbl_graph()\n\nThe tidygraph package splits our data into nodes and edges. We can selectively work on each by ‘activating’ them - here is the nodes list:\n\ndf_sankey |> \n activate(what = 'nodes') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nYou can see each unique node name listed. The row numbers for these nodes are used as reference IDs in the edges object:\n\ndf_sankey |> \n activate(what = 'edges') |> \n as_tibble() |> \n reactable(defaultPageSize = 5)\n\n\n\n\n\nWe now have enough information to generate our Sankey.\nFirst we extract our nodes and edges to separate data frames then convert the ID values to be zero-based (starts at 0) as this is what plotly is expecting. To do this is as simple as subtracting 1 from the value of the IDs.\nFinally we pass these two dataframes to plotly’s node and link function inputs to generate the plot.\n\n\nCode\n# extract the nodes to a dataframe\nnodes <- df_sankey |> \n activate(nodes) |> \n data.frame() |> \n mutate(\n id = row_number() -1\n )\n\n# extract the edges to a dataframe\nedges <- df_sankey |> \n activate(edges) |> \n data.frame() |> \n mutate(\n from = from - 1,\n to = to - 1\n )\n\n# plot our sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name\n ),\n \n # use our link data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow\n )\n)\n\n\n\n\nOur first sankey\n\n\nNot bad!\nWe can see the structure of our Sankey now. Can you see the relative proportions of participants who did or didn’t respond to our first invite? Marvel at how those who responded to the first invite flow into our final outcome. How about those who didn’t respond to the first invitation go on to receive a second invite?\nPlotly’s charts are interactive. Try hovering your cursor over the nodes and edges to highlight them and a pop-up box will appear giving you additional details. You can reorder the vertical position of the nodes by dragging them above or below an adjacent node.\nThis looks functional."
+ },
+ {
+ "objectID": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey",
+ "href": "blogs/posts/2024-02-28_sankey_plot.html#styling-our-sankey",
+ "title": "Visualising participant recruitment in R using Sankey plots",
+ "section": "Styling our Sankey",
+ "text": "Styling our Sankey\nNow we have the foundations of our Sankey I’d like to move on to its presentation. Specifically I’d like to:\n\nuse colour coding to clearly group those who accept or decline the invite,\nimprove the readability of the node titles,\nadd additional information to the pop-up boxes when you hover over nodes and edges, and\ncontrol the positioning of the nodes in the plot.\n\nAs our nodes and edges objects are dataframes it is straightforward to add this styling information directly to them.\nFor the nodes object we define colours based on the name of each node and manually position them in the plot\n\n\nCode\n# get the eligible population as a single value\n# NB, will be used to work out % amounts in each node and edge\ntemp_eligible_pop <- df_flows |> \n filter(from == 'Eligible population') |> \n summarise(total = sum(flow, na.rm = T)) |> \n pull(total)\n\n# style our nodes object\nnodes <- nodes |> \n mutate(\n # colour ----\n # add colour definitions, green for accepted, red for declined\n colour = case_when(\n str_detect(name, 'Accepted') ~ '#44bd32',\n str_detect(name, 'Declined') ~ '#c23616',\n str_detect(name, 'No response') ~ '#7f8fa6',\n str_detect(name, 'Eligible population') ~ '#7f8fa6'\n ),\n \n # add a semi-transparent colour for the edges based on node colours\n colour_fade = col2hcl(colour = colour, alpha = 0.3),\n \n # positioning ----\n # NB, I found that to position nodes you need to supply both\n # horizontal and vertical positions\n # NNB, it was a bit of trial and error to get the these positions just\n # right\n \n # horizontal positions (0 = left, 1 = right)\n x = case_when(\n str_detect(name, 'Eligible population') ~ 1,\n str_detect(name, 'Invitation 1') ~ 2,\n str_detect(name, 'Invitation 2') ~ 3,\n str_detect(name, 'Invitation 3') ~ 4,\n .default = 5\n ) |> rescale(to = c(0.001, 0.9)),\n \n # vertical position (1 = bottom, 0 = top)\n y = case_when(\n str_detect(name, 'Eligible population') ~ 5,\n # invite 1\n str_detect(name, 'Invitation 1 Accepted') ~ 1,\n str_detect(name, 'Invitation 1 No response') ~ 5,\n str_detect(name, 'Invitation 1 Declined') ~ 8.5,\n # invite 2\n str_detect(name, 'Invitation 2 Accepted') ~ 2,\n str_detect(name, 'Invitation 2 No response') ~ 5,\n str_detect(name, 'Invitation 2 Declined') ~ 7.8,\n # invite 3\n str_detect(name, 'Invitation 3 Accepted') ~ 2.7,\n str_detect(name, 'Invitation 3 No response') ~ 5.8,\n str_detect(name, 'Invitation 3 Declined') ~ 7.2,\n # final outcomes\n str_detect(name, 'Accepted') ~ 1,\n str_detect(name, 'No response') ~ 5,\n str_detect(name, 'Declined') ~ 8,\n .default = 5\n ) |> rescale(to = c(0.001, 0.999))\n ) |> \n # add in a custom field to show the percentage flow\n left_join(\n y = df_flows |> \n group_by(to) |> \n summarise(\n flow = sum(flow, na.rm = T),\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1),\n ) |> \n select(name = to, flow_perc),\n by = 'name'\n )\n\n# view our nodes data\nnodes |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the nodes dataframe\n\n\nNext we move to styling the edges, which is a much simpler prospect:\n\n\nCode\nedges <- edges |> \n mutate(\n # add a label for each flow to tell us how many people are in each\n label = number(flow, big.mark = ','),\n # add a percentage flow figure\n flow_perc = percent(flow / temp_eligible_pop, accuracy = 0.1)\n ) |> \n # add the faded colour from our nodes object to match the destinations\n left_join(\n y = nodes |> select(to = id, colour_fade),\n by = 'to'\n )\n\n# view our edges data\nedges |> \n reactable(defaultPageSize = 5)\n\n\n\n\nStyling the edges dataframe\n\n\nWe now have stylised node and edge tables ready and can bring it all together. Note the use of customdata and hovertemplate help to bring in additional information and styling to the pop-up boxes that appear when you hover over each flow and node.\n\n\nCode\n# plot our stylised sankey\nplot_ly(\n # setup\n type = 'sankey',\n orientation = 'h',\n arrangement = 'snap',\n \n # use our node data\n node = list(\n label = nodes$name,\n color = nodes$colour,\n x = nodes$x,\n y = nodes$y,\n customdata = nodes$flow_perc,\n hovertemplate = '%{label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n ),\n \n # use our edge data\n link = list(\n source = edges$from,\n target = edges$to,\n value = edges$flow,\n label = edges$label,\n color = edges$colour_fade,\n customdata = edges$flow_perc,\n hovertemplate = '%{source.label} → %{target.label}<br /><b>%{value}</b> participants<br /><b>%{customdata}</b> of eligible population'\n )\n) |> \n layout(\n font = list(\n family = 'Arial, Helvetica, sans-serif',\n size = 12\n ),\n # make the background transparent (also removes the text shadow)\n paper_bgcolor = 'rgba(0,0,0,0)'\n ) |> \n config(responsive = T)\n\n\n\n\nA stylish Sankey"
+ },
+ {
+ "objectID": "blogs/posts/2024-01-17_nearest_neighbour.html",
+ "href": "blogs/posts/2024-01-17_nearest_neighbour.html",
+ "title": "Nearest neighbour imputation",
+ "section": "",
+ "text": "Recently I have been gathering data by GP practice, from a variety of different sources. The ultimate purpose of my project is to be able to report at an ICB/sub-ICB level1. The various datasets cover different timescales and consequently changes in GP practices over time have left me with mismatching datasets.\n1 An ICB (Integrated Care Board) is a statutory NHS organisation responsible for planning health services for their local populationsMy approach has been to take as the basis of my project a recent GP List. Later in my project I want to perform calculations at a GP practice level based on an underlying health need and the data for this need is a CHD prevalence value from a dataset that is around 8 years old, and for which there is no update or alternative. From my recent list of 6454 practices, when I match to the need dataset, I am left with 151 practices without a value for need. If I remove these practices from the analysis then this could impact the analysis by sub-ICB since often a group of practices in the same area could be subject to changes, mergers and reorganisation.\nHere’s the packages and some demo objects to work with to create an example for two practices:\n\n\nCode\n# Packages\nlibrary(tidyverse)\nlibrary(sf)\nlibrary(tidygeocoder)\nlibrary(leaflet)\nlibrary(viridisLite)\nlibrary(gt)\n\n# Create some data with two practices with no need data \n# and a selection of practices locally with need data\npractices <- tribble(\n ~practice_code, ~postcode, ~has_orig_need, ~value,\n \"P1\",\"CV1 4FS\", 0, NA,\n \"P2\",\"CV1 3GB\", 1, 7.3,\n \"P3\",\"CV11 5TW\", 1, 6.9,\n \"P4\",\"CV6 3HZ\", 1, 7.1,\n \"P5\",\"CV6 1HS\", 1, 7.7,\n \"P6\",\"CV6 5DF\", 1, 8.2,\n \"P7\",\"CV6 3FA\", 1, 7.9,\n \"P8\",\"CV1 2DL\", 1, 7.5,\n \"P9\",\"CV1 4JH\", 1, 7.7,\n \"P10\",\"CV10 0GQ\", 1, 7.5,\n \"P11\",\"CV10 0JH\", 1, 7.8,\n \"P12\",\"CV11 5QT\", 0, NA,\n \"P13\",\"CV11 6AB\", 1, 7.6,\n \"P14\",\"CV6 4DD\", 1,7.9\n) \n\n# get domain of numeric data\n(domain <- range(practices$has_orig_need))\n\n# make a colour palette\npal <- colorNumeric(palette = viridis(2), domain = domain)\n\n\nTo provide a suitable estimate of need for the newer practices without values, all the practices in the dataset were geocoded2 using the geocode function from the {tidygeocoder} package.\n2 Geocoding is the process of converting addresses (often the postcode) into geographic coordinates (such as latitude and longitude) that can be plotted on a map.\npractices <- practices |>\n mutate(id = row_number()) |>\n geocode(postalcode = postcode) |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\n\nCode\npractices |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nhas_orig_need\nvalue\nid\ngeometry\n\n\n\n\nP1\nCV1 4FS\n0\nNA\n1\nc(-1.50686326666667, 52.4141089666667)\n\n\nP2\nCV1 3GB\n1\n7.3\n2\nc(-1.51888, 52.4034199)\n\n\nP3\nCV11 5TW\n1\n6.9\n3\nc(-1.46746, 52.519)\n\n\nP4\nCV6 3HZ\n1\n7.1\n4\nc(-1.52231, 52.42367)\n\n\nP5\nCV6 1HS\n1\n7.7\n5\nc(-1.52542, 52.41989)\n\n\nP6\nCV6 5DF\n1\n8.2\n6\nc(-1.498344825, 52.4250186)\n\n\nP7\nCV6 3FA\n1\n7.9\n7\nc(-1.51787, 52.43135)\n\n\nP8\nCV1 2DL\n1\n7.5\n8\nc(-1.49105, 52.40582)\n\n\nP9\nCV1 4JH\n1\n7.7\n9\nc(-1.50653, 52.41953)\n\n\nP10\nCV10 0GQ\n1\n7.5\n10\nc(-1.52197, 52.54074)\n\n\nP11\nCV10 0JH\n1\n7.8\n11\nc(-1.5163199, 52.53723)\n\n\nP12\nCV11 5QT\n0\nNA\n12\nc(-1.46927, 52.51899)\n\n\nP13\nCV11 6AB\n1\n7.6\n13\nc(-1.45822, 52.52682)\n\n\nP14\nCV6 4DD\n1\n7.9\n14\nc(-1.50832, 52.44104)\n\n\n\n\n\n\n\nThis map shows the practices, purple are the practices with no need data and yellow are practices with need data available.\n\n\nCode\n# make map to display practices\nleaflet(practices) |> \n addTiles() |>\n addCircleMarkers(color = ~pal(has_orig_need)) \n\n\n\n\n\n\nThe data was split into those with, and without, a value for need. Using st_join from the {sf} package to join those without, and those with, a value for need, using the geometry to find all those within 1500m (1.5km).\n\nno_need <- practices |>\n filter(has_orig_need == 0)\n\nwith_need <- practices |>\n filter(has_orig_need == 1)\n\n\nneighbours <- no_need |>\n select(no_need_postcode = postcode,no_need_prac_code=practice_code) |>\n st_join(with_need, st_is_within_distance, 1500) |>\n st_drop_geometry() |>\n select(id, no_need_postcode,no_need_prac_code) |>\n inner_join(x = with_need, by = join_by(\"id\")) \n\n\n\nCode\nleaflet(neighbours) |> \n addTiles() |>\n addCircleMarkers(color = \"purple\") |>\n addMarkers( -1.50686326666667, 52.4141089666667, popup = \"Practice with no data\"\n) |>\n addCircles(-1.50686326666667, 52.4141089666667,radius=1500) |>\n addMarkers(-1.46927, 52.51899, popup = \"Practice with no data\"\n) |>\naddCircles(-1.46927, 52.51899,radius=1500)\n\n\n\n\n\n\nThe data for the “neighbours” was grouped by the practice code of those without need data and a mean value was calculated for each practice to generate an estimated value.\n\nneighbours_estimate <- neighbours |>\n group_by(no_need_prac_code) |>\n summarise(need_est=mean(value)) |>\n st_drop_geometry(select(no_need_prac_code,need_est)) \n\nThe original data was joined back to the “neighbours”.\n\n practices_with_neighbours_estimate <- practices |>\n left_join(neighbours_estimate, join_by(practice_code==no_need_prac_code)) |>\n st_drop_geometry(select(practice_code,need_est))\n\n\n\nCode\n practices_with_neighbours_estimate |>\n select(-has_orig_need,-id) |>\n gt()\n\n\n\n\n\n\n\n\npractice_code\npostcode\nvalue\nneed_est\n\n\n\n\nP1\nCV1 4FS\nNA\n7.583333\n\n\nP2\nCV1 3GB\n7.3\nNA\n\n\nP3\nCV11 5TW\n6.9\nNA\n\n\nP4\nCV6 3HZ\n7.1\nNA\n\n\nP5\nCV6 1HS\n7.7\nNA\n\n\nP6\nCV6 5DF\n8.2\nNA\n\n\nP7\nCV6 3FA\n7.9\nNA\n\n\nP8\nCV1 2DL\n7.5\nNA\n\n\nP9\nCV1 4JH\n7.7\nNA\n\n\nP10\nCV10 0GQ\n7.5\nNA\n\n\nP11\nCV10 0JH\n7.8\nNA\n\n\nP12\nCV11 5QT\nNA\n7.250000\n\n\nP13\nCV11 6AB\n7.6\nNA\n\n\nP14\nCV6 4DD\n7.9\nNA\n\n\n\n\n\n\n\nFinally, an updated data frame was created of the need data using the actual need for the practice where available, otherwise using estimated need.\n\npractices_with_neighbours_estimate <- practices_with_neighbours_estimate |>\n mutate(need_to_use = case_when(value>=0 ~ value,\n .default = need_est)) |>\n select(practice_code,need_to_use) \n\n\n\n\n\n\n\n\n\npractice_code\nneed_to_use\n\n\n\n\nP1\n7.583333\n\n\nP2\n7.300000\n\n\nP3\n6.900000\n\n\nP4\n7.100000\n\n\nP5\n7.700000\n\n\nP6\n8.200000\n\n\nP7\n7.900000\n\n\nP8\n7.500000\n\n\nP9\n7.700000\n\n\nP10\n7.500000\n\n\nP11\n7.800000\n\n\nP12\n7.250000\n\n\nP13\n7.600000\n\n\nP14\n7.900000\n\n\n\n\n\n\n\nFor my project, this method has successfully generated a prevalence for 125 of the 151 practices without a need value, leaving just 26 practices without a need. This is using a 1.5 km radius. In each use case there will be a decision to make regarding a more accurate estimate (smaller radius) and therefore fewer matches versus a less accurate estimate (using a larger radius) and therefore more matches.\nThis approach could be replicated for other similar uses/purposes. A topical example from an SU project is the need to assign population prevalence for hypertension and compare it to current QOF3 data. Again, the prevalence data is a few years old so we have to move the historical data to fit with current practices and this leaves missing data that can be estimated using this method.\n\n\n3 QOF (Quality and Outcomes Framework) is a voluntary annual reward and incentive programme for all GP practices in England, detailing practice achievement results."
+ },
+ {
+ "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html",
+ "href": "blogs/posts/2024-05-22-storing-data-safely/index.html",
+ "title": "Storing data safely",
+ "section": "",
+ "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!"
+ },
+ {
+ "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding",
+ "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#coffee-coding",
+ "title": "Storing data safely",
+ "section": "",
+ "text": "In a recent Coffee & Coding session we chatted about storing data safely for use in Reproducible Analytical Pipelines (RAP), and the slides from the presentation are now available. We discussed the use of Posit Connect Pins and Azure Storage.\nIn order to avoid duplication, this blog post will not cover the pros and cons of each approach, and will instead focus on documenting the code that was used in our live demonstrations. I would recommend that you look through the slides before using the code in this blogpost and have them alongside, as they provide lots of useful context!"
+ },
+ {
+ "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins",
+ "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#posit-connect-pins",
+ "title": "Storing data safely",
+ "section": "Posit Connect Pins",
+ "text": "Posit Connect Pins\n\n# A brief intro to using {pins} to store, version, share and protect a dataset\n# on Posit Connect. Documentation: https://pins.rstudio.com/\n\n\n# Setup -------------------------------------------------------------------\n\n\ninstall.packages(c(\"pins\",\"dplyr\")) # if not yet installed\n\nsuppressPackageStartupMessages({\n library(pins)\n library(dplyr) # for wrangling and the 'starwars' demo dataset\n})\n\nboard <- board_connect() # will error if you haven't authenticated before\n# Error in `check_auth()`: ! auth = `auto` has failed to find a way to authenticate:\n# • `server` and `key` not provided for `auth = 'manual'`\n# • Can't find CONNECT_SERVER and CONNECT_API_KEY envvars for `auth = 'envvar'`\n# • rsconnect package not installed for `auth = 'rsconnect'`\n# Run `rlang::last_trace()` to see where the error occurred.\n\n# To authenticate\n# In RStudio: Tools > Global Options > Publishing > Connect... > Posit Connect\n# public URL of the Strategy Unit Posit Connect Server: connect.strategyunitwm.nhs.uk\n# Your browser will open to the Posit Connect web page and you're prompted to\n# for your password. Enter it and you'll be authenticated.\n\n# Once authenticated\nboard <- board_connect()\n# Connecting to Posit Connect 2024.03.0 at\n# <https://connect.strategyunitwm.nhs.uk>\n\nboard |> pin_list() # see all the pins on that board\n\n\n# Create a pin ------------------------------------------------------------\n\n\n# Write a dataset to the board as a pin\nboard |> pin_write(\n x = starwars,\n name = \"starwars_demo\"\n)\n# Guessing `type = 'rds'`\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_exists(\"starwars_demo\")\n# ! Use a fully specified name including user name: \"matt.dray/starwars_demo\",\n# not \"starwars_demo\".\n# [1] TRUE\n\npin_name <- \"matt.dray/starwars_demo\"\n\nboard |> pin_exists(pin_name) # logical, TRUE/FALSE\nboard |> pin_meta(pin_name) # metadata, see also 'metadata' arg in pin_write()\nboard |> pin_browse(pin_name) # view the pin in the browser\n\n\n# Permissions -------------------------------------------------------------\n\n\n# You can let people see and edit a pin. Log into Posit Connect and select the\n# pin under 'Content'. In the 'Settings' panel on the right-hand side, adjust\n# the 'sharing' options in the 'Access' tab.\n\n\n# Overwrite and version ---------------------------------------------------\n\n\nstarwars_droids <- starwars |>\n filter(species == \"Droid\") # beep boop\n\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# Writing to pin 'matt.dray/starwars_demo'\n\nboard |> pin_versions(pin_name) # see version history\nboard |> pin_versions_prune(pin_name, n = 1) # remove history\nboard |> pin_versions(pin_name)\n\n# What if you try to overwrite the data but it hasn't changed?\nboard |> pin_write(\n starwars_droids,\n pin_name,\n type = \"rds\"\n)\n# ! The hash of pin \"matt.dray/starwars_demo\" has not changed.\n# • Your pin will not be stored.\n\n\n# Use the pin -------------------------------------------------------------\n\n\n# You can read a pin to your local machine, or access it from a Quarto file\n# or Shiny app hosted on Connect, for example. If the output and the pin are\n# both on Connect, no authentication is required; the board is defaulted to\n# the Posit Connect instance where they're both hosted.\n\nboard |>\n pin_read(pin_name) |> # like you would use e.g. read_csv\n with(data = _, plot(mass, height)) # wow!\n\n\n# Delete pin --------------------------------------------------------------\n\n\nboard |> pin_exists(pin_name) # logical, good function for error handling\nboard |> pin_delete(pin_name)\nboard |> pin_exists(pin_name)"
+ },
+ {
+ "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r",
+ "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-r",
+ "title": "Storing data safely",
+ "section": "Azure Storage in R",
+ "text": "Azure Storage in R\nYou will need an .Renviron file with the four environment variables listed below for the code to work. This .Renviron file should be ignored by git. You can share the contents of .Renviron files with other team members via Teams, email, or Sharepoint.\nBelow is a sample .Renviron file\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nAZ_STORAGE_CONTAINER=container-name\nAZ_TENANT_ID=long-sequence-of-numbers-and-letters\nAZ_APP_ID=another-long-sequence-of-numbers-and-letters\n\ninstall.packages(c(\"AzureAuth\",\"AzureStor\", \"arrow\")) # if not yet installed\n\n# Load all environment variables\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")\n\n# Authenticate\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\n\n# If you have not authenticated before, you will be taken to an external page to\n# authenticate!Use your mlcsu.nhs.uk account.\n\n# Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\n\n# If you get a 403 error when trying to interact with the container, you may \n# have to clear your Azure token and re-authenticate using a different browser.\n# Use AzureAuth::clean_token_directory() to clear your token, then repeat the\n# AzureAuth::get_azure_token() step above.\n\n# Upload specific file to container\nAzureStor::storage_upload(container, \"data/ronald.jpeg\", \"newdir/ronald.jpeg\")\n\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(container, \"data/*\", \"newdir\")\n\n# Check files have uploaded\nblob_list <- AzureStor::list_blobs(container)\n\n# Load file directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by temporarily downloading file \n# and storing it in memory)\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\nparq_df <- arrow::read_parquet(parquet_in_memory)\n\n# Delete from Azure container (!!!)\nfor (blobfile in blob_list$name) {\n AzureStor::delete_storage_file(container, blobfile)\n}"
+ },
+ {
+ "objectID": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python",
+ "href": "blogs/posts/2024-05-22-storing-data-safely/index.html#azure-storage-in-python",
+ "title": "Storing data safely",
+ "section": "Azure Storage in Python",
+ "text": "Azure Storage in Python\nThis will use the same environment variables as the R version, just stored in a .env file instead.\nWe didn’t cover this in the presentation, so it’s not in the slides, but the code should be self-explanatory.\n\n\nimport os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg"
+ },
+ {
+ "objectID": "blogs/posts/2024-05-13_one-year-coffee-code.html",
+ "href": "blogs/posts/2024-05-13_one-year-coffee-code.html",
+ "title": "One year of coffee & coding",
+ "section": "",
+ "text": "The data science team have been running coffee & coding sessions for just over a year now. When I joined that Strategy Unit, I was really pleased to see these sessions running as I think making time to discuss and share technical knowledge is highly valuable, especially as an organisation grows.\nCoffee and coding sessions run every two weeks and usually take the form of a short presentation, followed by a discussion. Although we have had a variety of different sessions including live coding demos and show and tell for projects.\nWe figured it would be a good idea to do a quick survey of attendees to make sure that the sessions were beneficial and see if there were any suggestions for future sessions. We had 11 responses, all of which were really positive, with 90% agreeing that the sessions are interesting, and over 80% saying that they learn new things. Respondents said that the sessions were well varied across the technical spectrum and that they “almost always learn something useful”.\nThe two main themes of the results were that sessions were inclusive and sparked collaboration. ✨\n\nI like that everyone can contribute\n\n\nIt’s great seeing what else people are doing\n\n\nI get more ideas for future projects\n\nSome of the main suggestions included more content for newer programmers and encouraging the wider analytical team to share real project examples.\nSo with that, why not consider presenting? The sessions are informal and everyone is welcome to contribute. If you’ve got something to share, please let a member of the data science team know.\nAs a reminder, materials for our previous sessions are available under Presentations."
+ },
+ {
+ "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html",
+ "href": "blogs/posts/2023-03-24_hotfix-with-git.html",
+ "title": "Creating a hotfix with git",
+ "section": "",
+ "text": "I recently discovered a bug in a code-base which needed to be fixed and deployed back to production A.S.A.P., but since the last release the code has moved on significantly. The history looks something a bit like:\nThat is, we have a tag which is the code that is currently in production (which we need to patch), a number of commits after that tag to main (which were separate branches merged via pull requests), and a current development branch.\nI need to somehow: 1. go back to the tagged release, 2. check that code out, 3. patch that code, 4. commit this change, but insert the commit before all of the new commits after the tag\nThere are at least two ways that I know to do this, one would be with an interactive rebase, but I used a slightly longer method, but one I feel is a little less likely to get wrong.\nBelow are the step’s that I took. One thing I should note is this worked well for my particular issue because the change didn’t cause any merge conflicts later on."
+ },
+ {
+ "objectID": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase",
+ "href": "blogs/posts/2023-03-24_hotfix-with-git.html#fixing-my-codebase",
+ "title": "Creating a hotfix with git",
+ "section": "Fixing my codebase",
+ "text": "Fixing my codebase\nFirst, we need to checkout the tag\ngit checkout -b hotfix v0.2.0\nThis creates a new branch called hotfix off of the tag v0.2.0.\nNow that I have the code base checked out at the point I need to fix, I can make the change that is needed, and commit the change\ngit add [FILENAME]\ngit commit -m \"fixes the code\"\n(Obviously, I used the actual file name and gave a better commit message. I Promise 😝)\nNow my code is fixed, I create a new tag for this “release”, as well as push the code to production (this step is omitted here)\ngit tag v0.2.1 -m \"version 0.2.0\"\nAt this point, our history looks something like\n\n\n\n\n\n\n\n\n\nWhat we want to do is break the link between main and v0.2.0, instead attaching tov0.2.1. First though, I want to make sure that if I make a mistake, I’m not making it on the main branch.\ngit checkout main\ngit checkout -b apply-hotfix\nThen we can fix our history using the rebase command\ngit rebase hotfix\nWhat this does is it rolls back to the point where the branch that we are rebasing (apply-hotfix) and the hotfix branch both share a common commit (v0.2.0 tag). It then applies the commits in the hotfix branch, before reapplying the commits from apply-hotfix (a.k.a. the main branch).\nOne thing to note, if you have any merge conflicts created by your fix, then the rebase will stop and ask you to fix the merge conflicts. There is some information in the GitHub doc’s for [resolving merge conflicts after a Git rebase][2].\n[2]: https://docs.github.com/en/get-started/using-git/resolving-merge-conflicts-after-a-git-rebase\nAt this point, we can check that the commit history looks correct\ngit log v0.2.0..HEAD\nIf we are happy, then we can apply this to the main branch. I do this by renaming the apply-hotfix branch as main. First, you have to delete the main branch to allow us to rename the branch.\ngit branch -D main\ngit branch -m main\nWe also need to update the other branches to use the new main branch\ngit checkout branch\ngit rebase main\nNow, we should have a history like"
},
{
- "objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app-1",
- "href": "presentations/2023-07-11_haca-nhp-demand-model/index.html#outputs-app-1",
- "title": "An Introduction to the New Hospital Programme Demand Model",
- "section": "Outputs App",
- "text": "Outputs App"
+ "objectID": "blogs/index.html",
+ "href": "blogs/index.html",
+ "title": "Data Science Blog",
+ "section": "",
+ "text": "Map and Nest\n\n\n\n\n\n\npurrr\n\n\nR\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nAug 8, 2024\n\n\nRhian Davies\n\n\n\n\n\n\n\n\n\n\n\n\nStoring data safely\n\n\n\n\n\n\nlearning\n\n\nR\n\n\nPython\n\n\n\n\n\n\n\n\n\nMay 22, 2024\n\n\nYiWen Hon, Matt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nOne year of coffee & coding\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nMay 13, 2024\n\n\nRhian Davies\n\n\n\n\n\n\n\n\n\n\n\n\nRStudio Tips and Tricks\n\n\n\n\n\n\nlearning\n\n\nR\n\n\n\n\n\n\n\n\n\nMar 21, 2024\n\n\nMatt Dray\n\n\n\n\n\n\n\n\n\n\n\n\nVisualising participant recruitment in R using Sankey plots\n\n\n\n\n\n\nlearning\n\n\ntutorial\n\n\nvisualisation\n\n\nR\n\n\n\n\n\n\n\n\n\nFeb 28, 2024\n\n\nCraig Parylo\n\n\n\n\n\n\n\n\n\n\n\n\nNearest neighbour imputation\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 17, 2024\n\n\nJacqueline Grout\n\n\n\n\n\n\n\n\n\n\n\n\nAdvent of Code and Test Driven Development\n\n\n\n\n\n\nlearning\n\n\n\n\n\n\n\n\n\nJan 10, 2024\n\n\nYiWen Hon\n\n\n\n\n\n\n\n\n\n\n\n\nReinstalling R Packages\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nAlternative remote repositories\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nApr 26, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\n\n\n\n\n\n\nCreating a hotfix with git\n\n\n\n\n\n\ngit\n\n\ntutorial\n\n\n\n\n\n\n\n\n\nMar 24, 2023\n\n\nTom Jemmett\n\n\n\n\n\n\nNo matching items"
},
{
- "objectID": "presentations/2023-07-11_haca-nhp-demand-model/index.html#questions",
- "href": "presentations/2023-07-11_haca-nhp-demand-model/index.html#questions",
- "title": "An Introduction to the New Hospital Programme Demand Model",
- "section": "Questions?",
- "text": "Questions?\n\nContact The Strategy Unit\n\n\n strategy.unit@nhs.net\n The-Strategy-Unit\n\n\nContact Me\n\n\n thomas.jemmett@nhs.net\n tomjemmett\n\n\n\n\n\nview slides at https://tinyurl.com/haca23nhp"
+ "objectID": "about.html",
+ "href": "about.html",
+ "title": "About",
+ "section": "",
+ "text": "The Data Science team at the Strategy Unit comprises the following team members:\n\nChris Beeley\nMatt Dray\nOzayr Mohammed\nRhian Davies\nTom Jemmett\nYiWen Hon\n\nCurrent and previous projects of note include:\n\nWork supporting the New Hospitals Programme, including building a model for predicting the demand and capacity requirements of hospitals in the future, and a tool for mapping the evidence on this topic.\nThe Patient Experience Qualitative Data Categorisation project\nWork supporting the wider analytical community, through events/communities such as NHS-R and HACA."
},
{
- "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read",
- "href": "presentations/2023-03-09_coffee-and-coding/index.html#which-is-easier-to-read",
- "title": "Coffee and Coding",
- "section": "Which is easier to read?",
- "text": "Which is easier to read?\n\nae_attendances |>\n filter(org_code %in% c(\"RNA\", \"RL4\")) |>\n mutate(performance = 1 + breaches / attendances) |>\n filter(type == 1) |>\n mutate(met_target = performance >= 0.95)\n\nor\n\nae_attendances |>\n filter(\n org_code %in% c(\"RNA\", \"RL4\"),\n type == 1\n ) |>\n mutate(\n performance = 1 + breaches / attendances,\n met_target = performance >= 0.95\n )\n\n\n spending a few seconds to neatly format your code can greatly improve the legibility to future readers, making the intent of the code far clearer, and will make finding bugs easier to spot.\n\n\n (have you spotted the mistake in the snippets above?)"
+ "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html",
+ "href": "blogs/posts/2024-08-08-map-and-nest/index.html",
+ "title": "Map and Nest",
+ "section": "",
+ "text": "I want to share a framework that I like using occasionally for data analysis. It’s the nest-and-map and it’s helped me countless times when I’m working with related datasets. By combining {purrr} mapping with {tidyr} nesting, I can keep my analysis steps linked, allowing me to easily track from a summary or plot, back to the original data.\nThe main funtions we’ll need are"
},
{
- "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide",
- "href": "presentations/2023-03-09_coffee-and-coding/index.html#tidyverse-style-guide",
- "title": "Coffee and Coding",
- "section": "Tidyverse Style Guide",
- "text": "Tidyverse Style Guide\n\nGood coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread\n\n\nAll style guides are fundamentally opinionated. Some decisions genuinely do make code easier to use (especially matching indenting to programming structure), but many decisions are arbitrary. The most important thing about a style guide is that it provides consistency, making code easier to write because you need to make fewer decisions.\n\ntidyverse style guide"
+ "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html#example-on-nhs-workforce-statistics",
+ "href": "blogs/posts/2024-08-08-map-and-nest/index.html#example-on-nhs-workforce-statistics",
+ "title": "Map and Nest",
+ "section": "Example on NHS workforce statistics",
+ "text": "Example on NHS workforce statistics\nThe NHS workforce statistics are official statistics published monthly for England.\n\nstaff_group <- readRDS(file = \"workforce_staff_group.rds\")\n\nI want to perform an analysis for each of the 42 integrated care systems (ICS). The {tidyr} nest() function creates a list-column, where each cell contains a mini dataframe for each grouping.\nLet’s group by ICS, and call the nested data column raw_data.\n\ngroup_by_ics <- staff_group |>\n tidyr::nest(raw_data = -ics_name)\n\nThe new column is a list-column, with each cell containing an entire tibble of data relating to that individual ICS.\n\n#' echo: false\nhead(group_by_ics)\n\n# A tibble: 6 × 2\n ics_name raw_data \n <chr> <list> \n1 South East London <tibble [8 × 6]> \n2 North East London <tibble [7 × 6]> \n3 North Central London <tibble [12 × 6]>\n4 North West London <tibble [10 × 6]>\n5 South West London <tibble [8 × 6]> \n6 Devon <tibble [7 × 6]> \n\n\nWe can grab these mini datasets in the usual way and explore them interactively.\n\ngroup_by_ics$raw_data[[1]]\n\n# A tibble: 8 × 6\n organisation_name total hchs_doctors nurses_health_visitors midwives\n <chr> <dbl> <dbl> <dbl> <dbl>\n1 Total 58394 7108 14939 926\n2 Guy's and St Thomas' NHS F… 21361 3003 6196 281\n3 King's College Hospital NH… 13158 2443 4202 375\n4 Lewisham and Greenwich NHS… 6617 979 2103 271\n5 London Ambulance Service N… 7050 4 44 0\n6 NHS South East London ICB 617 9 43 0\n7 Oxleas NHS Foundation Trust 4094 200 1196 0\n8 South London and Maudsley … 5496 471 1155 0\n# ℹ 1 more variable: ambulance_staff <dbl>\n\n\nNext, let’s apply some simple processing, say converting absolute numbers into percentages, to each of the ICSs in turn.\nWe use mutate() to create a new list-column staff_percent and map() to apply the processing function to each cell in turn. 1\n\n\nSee function definition for convert_percent()\n\n\n#' Convert percent\n#' @param raw_staff Tibble containing organisation_name, total and a number of staff categories\n#' @return Tibble like raw_staff but with staff categories represented as percentages rather than absolute numbers\nconvert_percent <- function(staff){\n staff |>\n dplyr::mutate(dplyr::across(.cols = -c(organisation_name, total),\n .fns = \\(x)x/total)) |>\n dplyr::rename(\"Doctors\" = \"hchs_doctors\",\n \"Nurses\" = \"nurses_health_visitors\",\n \"Ambulance staff\" = \"ambulance_staff\",\n \"Midwives\" = \"midwives\")\n}\n\n\n\nprocessed_staff <-\ngroup_by_ics |>\n dplyr::mutate(\n staff_percent = purrr::map(raw_data, convert_percent)\n )\n\nWhere I think this map-and-nest process really comes into its own is creating plots. Often, I find myself wanting to create a couple of different plots for each grouping, and then optionally save the plots with sensible names. Particularly in the analysis stage, I like having these plots in the same row as the raw data, so I can quickly compare and validate.\nI’ve created two functions, plot_barchart() and plot_waffle() which take the data and create charts.\n\n\nSee definition for plot_barchart() & plot_waffle()\n\n\n#' Plot barchart\n#' Makes a bar chart of staff perentages by organisation\n#' @param df tibble of staff data in percent format\nplot_barchart <- function(df) {\n df |>\n dplyr::filter(organisation_name != \"Total\") |>\n dplyr::select(-total) |>\n tidyr::pivot_longer(cols = -c(organisation_name), names_to = \"job\", values_to = \"percent\") |>\n ggplot2::ggplot(ggplot2::aes(x = percent, y = organisation_name, fill = job)) +\n ggplot2::geom_col(position = \"dodge\") + \n ggplot2::scale_x_continuous(labels = scales::percent_format(scale = 100)) +\n ggplot2::labs(x = \"\", y = \"\") +\n StrategyUnitTheme::scale_fill_su() + \n ggplot2::theme_minimal() + \n ggplot2::theme(legend.title = ggplot2::element_blank())\n}\n\n#' Plot waffle\n#' Makes a waffle chart to visualise staff breakdown at an ICS level\n#' @param raw_staff count data of staff\n#' @param title Title for the graphic\nplot_waffle <- function(raw_staff, title) {\nwaffle_data <-\nraw_staff |>\n dplyr::filter(organisation_name == \"Total\") |>\n dplyr::select(-total, -organisation_name) |>\n tidyr::pivot_longer(cols = dplyr::everything(), names_to = \"names\", values_to = \"vals\") |>\n dplyr::mutate(vals = round(vals / 100))\n\nggplot2::ggplot(waffle_data, ggplot2::aes(fill = names, values = vals)) +\n waffle::geom_waffle(n_rows = 8, size = 0.33, colour = \"white\") +\n ggplot2::coord_equal() +\n ggplot2::theme_void() + \n ggplot2::theme(legend.title = ggplot2::element_blank()) +\n ggplot2::ggtitle(title)\n}\n\n\nAgain, using mutate() I can create a new column called barchart and I can map() the function plot_barchart(), applying it to each row at a time.\n\ngraphs <-\nprocessed_staff |>\n dplyr::mutate(\n barchart = purrr::map(staff_percent, plot_barchart)\n ) \n\nThe resulting column barchart is again a list-column, but this time instead of containing a tibble, it holds a ggplot object. A whole ggplot in a single cell. 2\nIf we want to pass two arguments to our function, we can replace map() with map2(). Here we’re using map2() to pass the ics_name column to use as a title in our waffle plot. 3\n\ngraphs <-\nprocessed_staff |>\n dplyr::mutate(\n waffle = purrr::map2(raw_data, ics_name, \n \\(data, title) plot_waffle(data, title)\n )\n ) \n\n\n\n\nAn example bar chart plot"
},
{
- "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends",
- "href": "presentations/2023-03-09_coffee-and-coding/index.html#lintr-styler-are-your-new-best-friends",
- "title": "Coffee and Coding",
- "section": "{lintr} + {styler} are your new best friends",
- "text": "{lintr} + {styler} are your new best friends\n\n\n{lintr}\n\n{lintr} is a static code analysis tool that inspects your code (without running it)\nit checks for certain classes of errors (e.g. mismatched { and (’s)\nit warns about potential issues (e.g. using variables that aren’t defined)\nit warns about places where you are not adhering to the code style\n\n\n{styler}\n\n{styler} is an RStudio add in that automatically reformats your code, tidying it up to match the style guide\n99.9% of the time it will give you equivalent code, but there is the potential that it may change the behaviour of your code\nit will overwrite the files that you ask it to run on however, so it is vital to be using version control\na good workflow here is to save your file, “stage” the changes to your file, then run {styler}. You can then revert back to the staged changed if needed."
+ "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html#putting-it-all-together",
+ "href": "blogs/posts/2024-08-08-map-and-nest/index.html#putting-it-all-together",
+ "title": "Map and Nest",
+ "section": "Putting it all together",
+ "text": "Putting it all together\nAll of these mutate() steps can actually be called in one step. Here’s the full workflow again in full after a little refactor. I’ve also used pivot_longer() to move the two plotting columns into a single plot column. This will make it easier for me to generate nice filenames, and save the plots.\n\nresults <-\nstaff_group |>\n tidyr::nest(raw_data = -ics_name) |>\n dplyr::mutate(\n staff_percent = purrr::map(raw_data, convert_percent),\n barchart = purrr::map(staff_percent, plot_barchart),\n waffle = purrr::map2(raw_data, ics_name, \\(data, title) plot_waffle(data, title)) \n ) |>\n tidyr::pivot_longer(cols = c(barchart, waffle), names_to = \"plot_type\", values_to = \"plot\") |>\n dplyr::mutate(filename = glue::glue(\"{snakecase::to_snake_case(ics_name)}_{plot_type}.png\"))\n\nThe walk() family of functions in {purrr} are used when the function you’re applying does not return an object, but is being used for it’s side-effect, for example reading or writing files.\nHere we call walk2(), passing in both the filename column and the plots column are arguments to save all the plots.\n\npurrr::walk2(\n results$filename,\n results$plot,\n \\(filename, plot) ggplot2::ggsave(file.path(\"plots\", filename), plot, width = 10, height = 6)\n)\n\nBy keeping everything together in one nested structure, I personally find it much easier to keep track of my analyses. If you’re doing a more complex or permenant analysis, you might want to consider setting up a more formal data processing pipeline, and following RAP principals."
},
{
- "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like",
- "href": "presentations/2023-03-09_coffee-and-coding/index.html#what-does-lintr-look-like",
- "title": "Coffee and Coding",
- "section": "What does {lintr} look like?",
- "text": "What does {lintr} look like?\n\n\n\nsource: Good practice for writing R code and R packages\n\nrunning lintr can be done in the console, e.g.\n\nlintr::lintr_dir(\".\")\n\nor via the Addins menu"
+ "objectID": "blogs/posts/2024-08-08-map-and-nest/index.html#footnotes",
+ "href": "blogs/posts/2024-08-08-map-and-nest/index.html#footnotes",
+ "title": "Map and Nest",
+ "section": "Footnotes",
+ "text": "Footnotes\n\n\nIn this example, we actually didn’t need to nest first. We could have performed the mutate() step on the full dataset.↩︎\nThis totally blew my mind the first time I saw it 🤯.↩︎\nWe’re mapping the relationship between the two inputs and the plot_waffle() with an anonymous function. This shorthand syntax for anonymous functions came in R v 4.1.0. For compatibility with older versions of R, you’ll need the ~ operator. For the different ways you can specify functions in {purrr} see the help file.↩︎"
},
{
- "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler",
- "href": "presentations/2023-03-09_coffee-and-coding/index.html#using-styler",
- "title": "Coffee and Coding",
- "section": "Using {styler}",
- "text": "Using {styler}\n\nsource: Good practice for writing R code and R packages"
+ "objectID": "blogs/posts/2023-04-26-reinstalling-r-packages.html",
+ "href": "blogs/posts/2023-04-26-reinstalling-r-packages.html",
+ "title": "Reinstalling R Packages",
+ "section": "",
+ "text": "R 4.3.0 was released last week. Anytime you update R you will probably find yourself in the position where no packages are installed. This is by design - the packages that you have installed may need to be updated and recompiled to work under new versions of R.\nYou may find yourself wanting to have all of the packages that you previously used, so one approach that some people take is to copy the previous library folder to the new versions folder. This isn’t a good idea and could potentially break your R install.\nAnother approach would be to export the list of packages in R before updating and then using that list after you have updated R. This can cause issues though if you install from places other than CRAN, e.g. bioconductor, or from GitHub.\nSome of these approaches are discussed on the RStudio Community Forum. But I prefer an approach of having a “spring clean”, instead only installing the packages that I know that I need.\nI maintain a list of the packages that I used as a gist. Using this, I can then simply run this script on any new R install. In fact, if you click the “raw” button on the gist, and copy that url, you can simply run\nsource(\"https://gist.githubusercontent.com/tomjemmett/c105d3e0fbea7558088f68c65e68e1ed/raw/a1db4b5fa0d24562d16d3f57fe8c25fb0d8aa53e/setup.R\")\nGenerally, sourcing a url is a bad idea - the reason for this is if it’s not a link that you control, then someone could update the contents and run arbritary code on your machine. In this case, I’m happy to run this as it’s my own gist, but you should be mindful if running it yourself!\nIf you look at the script I first install a number of packages from CRAN, then I install packages that only exist on GitHub."
},
{
- "objectID": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility",
- "href": "presentations/2023-03-09_coffee-and-coding/index.html#further-thoughts-on-improving-code-legibility",
- "title": "Coffee and Coding",
- "section": "Further thoughts on improving code legibility",
- "text": "Further thoughts on improving code legibility\n\ndo not let files grow too big\nbreak up logic into separate files, then you can use source(\"filename.R) to run the code in that file\nidealy, break up your logic into separate functions, each function having it’s own file, and then call those functions within your analysis\ndo not repeat yourself - if you are copying and pasting your code then you should be thinking about how to write a single function to handle this repeated logic\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html",
+ "title": "RStudio Tips and Tricks",
+ "section": "",
+ "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#what-is-testing",
- "title": "Unit testing in R",
- "section": "What is testing?",
- "text": "What is testing?\n\nSoftware testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to appreciate and understand the risks of software implementation\nwikipedia"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#coffee-coding",
+ "title": "RStudio Tips and Tricks",
+ "section": "",
+ "text": "In a recent Coffee & Coding session we chatted about tips and tricks for RStudio, the popular and free Integrated Development Environment (IDE) that many Strategy Unit analysts use to write R code.\nRStudio has lots of neat features but many are tucked away in submenus. This session was a chance for the community to uncover and discuss some hidden gems to make our work easier and faster."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code",
- "title": "Unit testing in R",
- "section": "How can we test our code?",
- "text": "How can we test our code?\n\n\nStatically\n\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\n\nDynamically"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#official-guidance",
+ "title": "RStudio Tips and Tricks",
+ "section": "Official guidance",
+ "text": "Official guidance\nPosit is the company who build and maintain RStudio. They host a number of cheatsheets on their website, including one for RStudio. They also have a more in-depth user guide."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-can-we-test-our-code-1",
- "title": "Unit testing in R",
- "section": "How can we test our code?",
- "text": "How can we test our code?\n\n\nStatically\n\n(without executing the code)\nhappens constantly, as we are writing code\nvia code reviews\ncompilers/interpreters/linters statically analyse the code for syntax errors\n\n\n\n\nDynamically\n\n\n(by executing the code)\nsplit into functional and non-functional testing\ntesting can be manual, or automated\n\n\n\n\n\nnon-functional testing covers things like performance, security, and usability testing"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#command-palette",
+ "title": "RStudio Tips and Tricks",
+ "section": "Command palette",
+ "text": "Command palette\nRStudio has a powerful built-in Command Palette, which is a special search box that gives instant access to features and settings without needing to find them in the menus. Many of the tips and tricks we discussed can be found by searching in the Palette. Open it with the keyboard shortcut Ctrl + Shift + P.\n\n\n\nOpening the Command Palette.\n\n\nFor example, let’s say you forgot how to restart R. If you open the Command Palette and start typing ‘restart’, you’ll see the option ‘Restart R Session’. Clicking it will do exactly that. Handily, the Palette also displays the keyboard shortcut (Control + Shift + F10 on Windows) as a reminder.\nAs for settings, a search for ‘rainbow’ in the Command Palette will find ‘Use rainbow parentheses’, an option to help prevent bracket-mismatch errors by colouring pairs of parentheses. What’s nice is that the checkbox to toggle the feature appears right there in the palette so you can change it immediately.\nI refer to menu paths and keyboard shortcuts in the rest of this post, but bear in mind that you can use the Command Palette instead."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests",
- "title": "Unit testing in R",
- "section": "Different types of functional tests",
- "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\n\n\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements."
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#options",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#options",
+ "title": "RStudio Tips and Tricks",
+ "section": "Options",
+ "text": "Options\nIn general, most settings can be found under Tools > Global Options… and many of these are discussed in the rest of this post.\n\n\n\nAdjusting workspace and history settings.\n\n\nBut there’s a few settings in particular that we recommend you change to help maximise reproducibility and reduce the chance of confusion. Under General > Basic, uncheck ‘Restore .Rdata into workspace at startup’ and select ‘Never’ from the dropdown options next to ‘Save workspace to .Rdata on exit’. These options mean you start with the ‘blank slate’ of an empty environment when you open a project, allowing you to rebuild objects from scratch1."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-1",
- "title": "Unit testing in R",
- "section": "Different types of functional tests",
- "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\n\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nUnit, Integration, and E2E testing are all things we can automate in code, whereas UAT testing is going to be manual"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#keyboard-shortcuts",
+ "title": "RStudio Tips and Tricks",
+ "section": "Keyboard shortcuts",
+ "text": "Keyboard shortcuts\nYou can speed up day-to-day coding with keyboard shortcuts instead of clicking buttons in the interface.\nYou can see some available shortcuts in RStudio if you navigate to Help > Keyboard Shortcuts Help, or use the shortcut Alt + Shift + K (how meta). You can go to Help > Modify Keyboard Shortcuts… to search all shortcuts and change them to what you prefer2.\nWe discussed a number of handy shortcuts that we use frequently3. You can:\n\nre-indent lines to the appropriate depth with Control + I\nreformat code with Control + Shift + A\nturn one or more lines into a comment with Control + Shift + C\ninsert the pipe operator (%>% or |>4) with Control + Shift + M5\ninsert the assignment arrow (<-) with Alt + - (hyphen)\nhighlight a function in the script or console and press F1 to open the function documentation in the ‘Help’ pane\nuse ‘Find in Files’ to search for a particular variable, function or string across all the files in your project, with Control + Shift + F"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#different-types-of-functional-tests-2",
- "title": "Unit testing in R",
- "section": "Different types of functional tests",
- "text": "Different types of functional tests\nUnit Testing checks each component (or unit) for accuracy independently of one another.\n\nIntegration Testing integrates units to ensure that the code works together.\nEnd-to-End Testing (e2e) makes sure that the entire system functions correctly.\nUser Acceptance Testing (UAT) ensures that the product meets the real user’s requirements.\n\n\nOnly focussing on unit testing in this talk, but the techniques/packages could be extended to integration testing. Often other tools (potentially specific tools) are needed for E2E testing."
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#themes",
+ "title": "RStudio Tips and Tricks",
+ "section": "Themes",
+ "text": "Themes\nYou can change a number of settings to alter RStudio’s theme, colours and fonts to whatever you desire.\nYou can change the default theme in Tools > Global Options… > Appearance > Editor theme and select one from the pre-installed list. You can upload new themes by clicking the ‘Add’ button and selecting a theme from your computer. They typically have the file extension .rsthemes and can be downloaded from the web, or you can create or tweak one yourself. The {rsthemes} package has a number of options and also allows you to switch between themes and automatically switch between light and dark themes depending on the time of day.\n\n\n\nCustomising the appearance and font.\n\n\nIn the same ‘Appearance’ submenu as the theme settings, you can find an option to change fonts. Monospace fonts, ones where each character takes up the same width, will appear here automatically if you’ve installed them on your computer. One popular font for coding is Fira Code, which has the special property of converting certain sets of characters into ‘ligatures’, which some people find easier to read. For example, the base pipe will appear as a rightward-pointing arrow rather than its constituent vertical-pipe and greater-than symbol (|>)."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#example",
- "title": "Unit testing in R",
- "section": "Example",
- "text": "Example\nWe have a {shiny} app which grabs some data from a database, manipulates the data, and generates a plot.\n\n\nwe would write unit tests to check the data manipulation and plot functions work correctly (with pre-created sample/simple datasets)\nwe would write integration tests to check that the data manipulation function works with the plot function (with similar data to what we used for the unit tests)\nwe would write e2e tests to ensure that from start to finish the app grabs the data and produces a plot as required\n\n\n\nsimple (unit tests) to complex (e2e tests)"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#panes",
+ "title": "RStudio Tips and Tricks",
+ "section": "Panes",
+ "text": "Panes\n\nLayout\nThe structural layout of RStudio’s panes can be adjusted. One simple thing you can do is minimise and maximise each pane by clicking the window icons in their upper-right corners. This is useful when you want more screen real-estate for a particular pane.\nYou can move pane loations too. Click the ‘Workspace Panes’ button (a square with four more inside it) at the top of the IDE to see a number of settings. For example, you can select ‘Console on the right’ to move the R console to the upper-right pane, which you may prefer for maximimsing the vertical space in which code is shown. You could also click Pane Layout… in this menu to be taken to Tools > Global Options… > Pane layout, where you can click ‘Add Column’ to insert new script panes that allow you to inspect and write multiple files side-by-side.\n\n\nScript navigation\nThe script pane in particular has a nice feature for navigating through sections of your script or Quarto/R Markdown files. Click the ‘Show Document Outline’ button or use the keyboard shortcut Control + Shift + O to slide open a tray that provides a nice indented list of all the sections and function defintions in your file.\nSection headers are auto-detected in a Quarto or R Markdown document wherever the Markdown header markup has been used: one hashmark (#) for a level 1 header, two for level 2, and so on. To add section headers to an R Script, add at least four hyphens after a commented line that starts with #. Use two or more hashes at the start of the comment to increase the nestedness of that section.\n\n# Header ------------------------------------------------------------------\n\n## Section ----\n\n### Subsection ----\n\nNote that Ctrl + Shift + R will open a dialog box for you to input the name of a section header, which will be inserted and automatically padded to 75 characters to provide a strong visual cue between sections.\nAs well as the document outline, there’s also a reminder in the lower-left of the script pane that gives the name of the section that your cursor is currently in. A symbol is also shown: a hashmark means it’s a headed section and an ‘f’ means it’s a function definition. You can click this to jump to other sections.\n\n\n\nNavigating with headers in the R script pane.\n\n\n\n\nBackground jobs\nPerhaps an under-used pane is ‘Background jobs’. This is where you can run a separate R process that keeps your R console free. Go to Tools > Background Jobs > Start Background Job… to expose this tab if it isn’t already listed alongside the R console.\nWhy might you want to do this? As I write this post, there’s a background process to detect changes to the Quarto document that I’m writing and then update a preview I have running in the browser. You can do something similar for Shiny apps. You can continue to develop your app and test things in the console and the app preview will update on save. You won’t need to keep hitting the ‘Render’ or ‘Run app’ button every time you make a change."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-pyramid",
- "title": "Unit testing in R",
- "section": "Testing Pyramid",
- "text": "Testing Pyramid\n\n\nImage source: The Testing Pyramid: Simplified for One and All headspin.io"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#magic-wand",
+ "title": "RStudio Tips and Tricks",
+ "section": "Magic wand",
+ "text": "Magic wand\nThere’s a miscellany of useful tools available when you click the ‘magic wand’ button in the script pane.\n\n\n\nAbracadabra! Casting open the ‘magic wand’ menu.\n\n\nThis includes:\n\n‘Rename in Scope’, which is like find-and-replace but you only change instances with the same ‘scope’, so you could select the variable x, go to Rename in Scope and then you can edit all instances of the variable in the document and change them at the same time (e.g. to rename them)\n‘Reflow Comment’, which you can click after higlighting a comments block to have the comments automatically line-break at the maximum width\n‘Insert Roxygen Skeleton’, which you can click when your cursor is inside the body of a function you’ve written and a {roxygen2} documentation template will be added above your function with the @params argument names pre-filled\n\nAlong with ‘Comment/Uncomment Lines’, ‘Reindent Lines’ and ‘Reformat Lines’, mentioned above in the keyboard shortcuts section."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function",
- "title": "Unit testing in R",
- "section": "Let’s create a simple function…",
- "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#wrapping-up",
+ "title": "RStudio Tips and Tricks",
+ "section": "Wrapping up",
+ "text": "Wrapping up\nTime was limited in our discussion. There are so many more tips and tricks that we didn’t get to. Let us know what we missed, or what your favourite shortcuts and settings are."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-1",
- "title": "Unit testing in R",
- "section": "Let’s create a simple function…",
- "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}"
+ "objectID": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes",
+ "href": "blogs/posts/2023-03-21-rstudio-tips/index.html#footnotes",
+ "title": "RStudio Tips and Tricks",
+ "section": "Footnotes",
+ "text": "Footnotes\n\n\nFor the same reason it’s a good idea to restart R on a frequent basis. You may assume that an object x in your environment was made in a certain way and contains certain information, but does it? What if you overwrote it at some point and forgot? Best to wipe the slate clean and rebuild it from scratch. Jenny Bryan has written an explainer.↩︎\nYou can ‘snap focus’ to the script and console panes with the pre-existing shortcuts Control + 1 and Control + 2. My next most-used pane is the terminal, so I’ve re-mapped the shortcut to Control + 3.↩︎\nThe classic shortcuts of select-all (Control + A), cut (Control + X), copy Control + C, paste (Control + V), undo (Control + Z) and redo (Control + Shift + Z) are all available when editing.↩︎\nNote that you can set the default pipe to the base-R version (|>) by checking the box at Tools > Global Options… > Code > Use native pipe operator↩︎\nProbably ‘M’ for {magrittr}, the name of the package that contains the %>% incarnation of the operator.↩︎"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-create-a-simple-function-2",
- "title": "Unit testing in R",
- "section": "Let’s create a simple function…",
- "text": "Let’s create a simple function…\n\nmy_function <- function(x, y) {\n \n stopifnot(\n \"x must be numeric\" = is.numeric(x),\n \"y must be numeric\" = is.numeric(y),\n \"x must be same length as y\" = length(x) == length(y),\n \"cannot divide by zero!\" = y != 0\n )\n\n x / y\n}\n\n\nThe Ten Rules of Defensive Programming in R"
+ "objectID": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html",
+ "href": "blogs/posts/2024-05-22-storing-data-safely/azure_python.html",
+ "title": "Data Science @ The Strategy Unit",
+ "section": "",
+ "text": "import os\nimport io\nimport pandas as pd\nfrom dotenv import load_dotenv\nfrom azure.identity import DefaultAzureCredential\nfrom azure.storage.blob import ContainerClient\n\n\n# Load all environment variables\nload_dotenv()\naccount_url = os.getenv('AZ_STORAGE_EP')\ncontainer_name = os.getenv('AZ_STORAGE_CONTAINER')\n\n\n# Authenticate\ndefault_credential = DefaultAzureCredential()\n\nFor the first time, you might need to authenticate via the Azure CLI\nDownload it from https://learn.microsoft.com/en-us/cli/azure/install-azure-cli-windows?tabs=azure-cli\nInstall then run az login in your terminal. Once you have logged in with your browser try the DefaultAzureCredential() again!\n\n# Connect to container\ncontainer_client = ContainerClient(account_url, container_name, default_credential)\n\n\n# List files in container - should be empty\nblob_list = container_client.list_blob_names()\nfor blob in blob_list:\n if blob.startswith('newdir'):\n print(blob)\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Upload file to container\nwith open(file='data/cats.csv', mode=\"rb\") as data:\n blob_client = container_client.upload_blob(name='newdir/cats.csv', \n data=data, \n overwrite=True)\n\n\n# # Check files have uploaded - List files in container again\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.csv\nnewdir/cats.parquet\nnewdir/ronald.jpeg\n\n\n\n# Download file from Azure container to temporary filepath\n\n# Connect to blob\nblob_client = container_client.get_blob_client('newdir/cats.csv')\n\n# Write to local file from blob\ntemp_filepath = os.path.join('temp_data', 'cats.csv')\nwith open(file=temp_filepath, mode=\"wb\") as sample_blob:\n download_stream = blob_client.download_blob()\n sample_blob.write(download_stream.readall())\ncat_data = pd.read_csv(temp_filepath)\ncat_data.head()\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# Load directly from Azure - no local copy\n\ndownload_stream = blob_client.download_blob()\nstream_object = io.BytesIO(download_stream.readall())\ncat_data = pd.read_csv(stream_object)\ncat_data\n\n\n\n\n\n\n\n\nName\nPhysical_characteristics\nBehaviour\n\n\n\n\n0\nRonald\nWhite and ginger\nLazy and greedy but undoubtedly cutest and best\n\n\n1\nKaspie\nSmall calico\nSweet and very shy but adventurous\n\n\n2\nHennimore\nPale orange\nUnhinged and always in a state of panic\n\n\n3\nThug cat\nBlack and white - very large\nLocal bully\n\n\n4\nSon of Stripey\nGrey tabby\nVery vocal\n\n\n\n\n\n\n\n\n# !!!!!!!!! Delete from Azure container !!!!!!!!!\nblob_client = container_client.get_blob_client('newdir/cats.csv')\nblob_client.delete_blob()\n\n\nblob_list = container_client.list_blobs()\nfor blob in blob_list:\n if blob['name'].startswith('newdir'):\n print(blob['name'])\n\nnewdir/cats.parquet\nnewdir/ronald.jpeg"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test",
- "title": "Unit testing in R",
- "section": "… and create our first test",
- "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
+ "objectID": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html",
+ "href": "blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html",
+ "title": "Advent of Code and Test Driven Development",
+ "section": "",
+ "text": "Advent of Code is an annual event, where daily coding puzzles are released from 1st – 24th December. We ran one of our fortnightly Coffee & Coding sessions introducing Advent of Code to people who code in the Strategy Unit, as well as the concept of test-driven development as a potential way of approaching the puzzles.\nTest-driven development (TDD) is an approach to coding which involves writing the test for a function BEFORE we write the function. This might seem quite counterintuitive, but it makes it easier to identify bugs 🐛 when they are introduced to our code, and ensures that our functions meet all necessary criteria. From my experience, this takes quite a long time to implement and can be quite tedious, but it is definitely worth it overall, especially as your project develops. Testing is also recommended in the NHS Reproducible Analytical Pipeline (RAP) guidelines.\nAn interesting thing to note about TDD is that we’re always expecting our first test to fail, and indeed failing tests are useful and important! If we wrote tests that just passed all the time, this would not be useful at all for our code.\nThe way that Advent of Code is structured, with test data for each puzzle and an expected test result, makes it very amenable to a test-driven approach. In order to support this, Matt and I created template repositories for a test-driven approach to Advent of Code, in Python and in R.\nOur goal when setting this up was to introduce others in the Strategy Unit to both TDD and Advent of Code. Advent of code can be challenging and I personally struggle to get past the first week, but it encourages creative (and maybe even fun?!) approaches to coding problems. I’m glad that we had the chance to explore some of the puzzles together in Coffee & Coding – it was interesting to see so many different approaches to the same problem, and hopefully it also gave us all the chance to practice writing tests."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-1",
- "title": "Unit testing in R",
- "section": "… and create our first test",
- "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
+ "objectID": "blogs/posts/2023-04-26_alternative_remotes.html",
+ "href": "blogs/posts/2023-04-26_alternative_remotes.html",
+ "title": "Alternative remote repositories",
+ "section": "",
+ "text": "It’s great when someone send’s you a pull request on GitHub to fix bugs or add new features to your project, but you probably always want to check the other persons work in someway before merging that pull request.\nAll of the steps below are intended to be entered via a terminal.\nLet’s imagine that we have a GitHub account called example and a repository called test, and we use https rather than ssh.\n$ git remote get-url origin\n# https://github.com/example/test.git\nNow, let’s say we have someone who has submitted a Pull Request (PR), and their username is friend. We can add a new remote for their fork with\n$ git remote add friend https://github.com/friend/test.git\nHere, I name the remote exactly as per the persons GitHub username for no other reason than making it easier to track things later on. You could name this remote whatever you like, but you will need to make sure that the remote url matches their repository correctly.\nWe are now able to checkout their remote branch. First, we will want to fetch their work:\n# make sure to replace the remote name to what you set it to before\n$ git fetch friend\nNow, hopefully they have commited to a branch with a name that you haven’t used. Let’s say they created a branch called my_work. You can then simply run\n$ git switch friend/my_work\nThis should checkout the my_work branch locally for you.\nNow, if they have happened to use a branch name that you are already using, or more likely, directly commited to their own main branch, you will need to do checkout to a new branch:\n# replace friend as above to be the name of the remote, and main to be the branch\n# that they have used\n# replace their_work with whatever you want to call this branch locally\n$ git checkout friend/main -b their_work\nYou are now ready to run their code and check everything is good to merge!\nFinally, If you want to clean up your local repository you can remove the new branch that you checked out and the new remote with the following steps:\n# switch back to one of your branches, e.g. main\n$ git checkout main\n\n# then remove the branch that you created above\n$ git branch -D their_work\n\n# you can remove the remote\n$ git remote remove friend"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-2",
- "title": "Unit testing in R",
- "section": "… and create our first test",
- "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#why",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#why",
+ "title": "Store Data Safely",
+ "section": "Why?",
+ "text": "Why?\nBecause:\n\ndata may be sensitive\nGitHub was designed for source control of code\nGitHub has repository file-size limits\nit makes data independent from code\nit prevents repetition"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-3",
- "title": "Unit testing in R",
- "section": "… and create our first test",
- "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#other-approaches",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#other-approaches",
+ "title": "Store Data Safely",
+ "section": "Other approaches",
+ "text": "Other approaches\nTo prevent data commits:\n\nuse a .gitignore file (*.csv, etc)\nuse Git hooks\navoid ‘add all’ (git add .) when staging\nensure thorough reviews of (small) pull-requests"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-4",
- "title": "Unit testing in R",
- "section": "… and create our first test",
- "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#what-if-i-committed-data",
+ "title": "Store Data Safely",
+ "section": "What if I committed data?",
+ "text": "What if I committed data?\n‘It depends’, but if it’s sensitive:\n\n‘undo’ the commit with git reset\nuse a tool like BFG to expunge the file from Git history\ndelete the repo and restart 🔥\n\nA data security breach may have to be reported."
},
- {
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#and-create-our-first-test-5",
- "title": "Unit testing in R",
- "section": "… and create our first test",
- "text": "… and create our first test\n\ntest_that(\"my_function correctly divides values\", {\n expect_equal(\n my_function(4, 2),\n 2\n )\n expect_equal(\n my_function(1, 4),\n 0.25\n )\n expect_equal(\n my_function(c(4, 1), c(2, 4)),\n c(2, 0.25)\n )\n})\n\nTest passed 😸"
+ {
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#data-hosting-solutions",
+ "title": "Store Data Safely",
+ "section": "Data-hosting solutions",
+ "text": "Data-hosting solutions\nWe’ll talk about two main options for The Strategy Unit:\n\nPosit Connect and the {pins} package\nAzure Data Storage\n\nWhich to use? It depends."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#other-expect_-functions",
- "title": "Unit testing in R",
- "section": "other expect_*() functions…",
- "text": "other expect_*() functions…\n\ntest_that(\"my_function correctly divides values\", {\n expect_lt(\n my_function(4, 2),\n 10\n )\n expect_gt(\n my_function(1, 4),\n 0.2\n )\n expect_length(\n my_function(c(4, 1), c(2, 4)),\n 2\n )\n})\n\nTest passed 🎉\n\n\n\n{testthat} documentation"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#a-platform-by-posit",
+ "title": "Store Data Safely",
+ "section": "A platform by Posit",
+ "text": "A platform by Posit\n\n\nhttps://connect.strategyunitwm.nhs.uk/"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert",
- "title": "Unit testing in R",
- "section": "Arrange, Act, Assert",
- "text": "Arrange, Act, Assert\n\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n # \n #\n #\n\n # act\n #\n\n # assert\n #\n})"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#a-package-by-posit",
+ "title": "Store Data Safely",
+ "section": "A package by Posit",
+ "text": "A package by Posit\n\n\nhttps://pins.rstudio.com/"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-1",
- "title": "Unit testing in R",
- "section": "Arrange, Act, Assert",
- "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\n\n\nto create sample values\ncreate fake/temporary files\nset random seed\nset R options/environment variables\n\n\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n #\n\n # assert\n #\n})"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#basic-approach",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#basic-approach",
+ "title": "Store Data Safely",
+ "section": "Basic approach",
+ "text": "Basic approach\ninstall.packages(\"pins\")\nlibrary(pins)\n\nboard_connect()\npin_write(board, data, \"pin_name\")\npin_read(board, \"user_name/pin_name\")"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-2",
- "title": "Unit testing in R",
- "section": "Arrange, Act, Assert",
- "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n #\n})"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#live-demo",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#live-demo",
+ "title": "Store Data Safely",
+ "section": "Live demo",
+ "text": "Live demo\n\nLink RStudio to Posit Connect (authenticate)\nConnect to the board\nWrite a new pin\nCheck pin status and details\nPin versions\nUse pinned data\nUnpin your pin"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#arrange-act-assert-3",
- "title": "Unit testing in R",
- "section": "Arrange, Act, Assert",
- "text": "Arrange, Act, Assert\n\n\nwe arrange the environment, before running the function\nwe act by calling the function\nwe assert that the actual results match our expected results\n\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#should-i-use-it",
+ "title": "Store Data Safely",
+ "section": "Should I use it?",
+ "text": "Should I use it?\n\n\n⚠️ {pins} is not great because:\n\nyou should not upload sensitive data!\nthere’s a file-size upload limit\npin organisation is a bit awkward (no subfolders)\n\n\n{pins} is helpful because:\n\nauthentication is straightforward\ndata can be versioned\nyou can control permissions\nthere are R and Python versions of the package"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#our-test-failed",
- "title": "Unit testing in R",
- "section": "Our test failed!?! 😢",
- "text": "Our test failed!?! 😢\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected)\n})\n\n── Failure: my_function works ──────────────────────────────────────────────────\n`actual` not equal to `expected`.\n1/1 mismatches\n[1] 0.714 - 0.714 == 7.14e-07\n\n\nError:\n! Test failed"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#what-is-azure-data-storage",
+ "title": "Store Data Safely",
+ "section": "What is Azure Data Storage?",
+ "text": "What is Azure Data Storage?\nMicrosoft cloud storage for unstructured data or ‘blobs’ (Binary Large Objects): data objects in binary form that do not necessarily conform to any file format.\nHow is it different?\n\nNo hierarchy – although you can make pseudo-‘folders’ with the blobnames.\nAuthenticates with your Microsoft account."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#tolerance-to-the-rescue",
- "title": "Unit testing in R",
- "section": "Tolerance to the rescue 🙂",
- "text": "Tolerance to the rescue 🙂\n\ntest_that(\"my_function works\", {\n # arrange\n x <- 5\n y <- 7\n expected <- 0.714285\n\n # act\n actual <- my_function(x, y)\n\n # assert\n expect_equal(actual, expected, tolerance = 1e-6)\n})\n\nTest passed 🎊\n\n\n\n(this is a slightly artificial example, usually the default tolerance is good enough)"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#authenticating-to-azure-data-storage",
+ "title": "Store Data Safely",
+ "section": "Authenticating to Azure Data Storage",
+ "text": "Authenticating to Azure Data Storage\n\nYou are all part of the “strategy-unit-analysts” group; this gives you read/write access to specific Azure storage containers.\nYou can store sensitive information like the container ID in a local .Renviron or .env file that should be ignored by git.\nUsing {AzureAuth}, {AzureStor} and your credentials, you can connect to the Azure storage container, upload files and download them, or read the files directly from storage!"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-edge-cases",
- "title": "Unit testing in R",
- "section": "Testing edge cases",
- "text": "Testing edge cases\n\n\nRemember the validation steps we built into our function to handle edge cases?\n\nLet’s write tests for these edge cases:\nwe expect errors\n\n\ntest_that(\"my_function works\", {\n expect_error(my_function(5, 0))\n expect_error(my_function(\"a\", 3))\n expect_error(my_function(3, \"a\"))\n expect_error(my_function(1:2, 4))\n})\n\nTest passed 🎊"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables",
+ "title": "Store Data Safely",
+ "section": "Step 1: load your environment variables",
+ "text": "Step 1: load your environment variables\nStore sensitive info in an .Renviron file that’s kept out of your Git history! The info can then be loaded in your script.\n.Renviron:\nAZ_STORAGE_EP=https://STORAGEACCOUNT.blob.core.windows.net/\nScript:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\nTip: reload .Renviron with readRenviron(\".Renviron\")"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example",
- "title": "Unit testing in R",
- "section": "Another (simple) example",
- "text": "Another (simple) example\n\n\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\nConsider this function - there is branched logic, so we need to carefully design tests to validate the logic works as intended."
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#step-1-load-your-environment-variables-1",
+ "title": "Store Data Safely",
+ "section": "Step 1: load your environment variables",
+ "text": "Step 1: load your environment variables\nIn the demo script we are providing, you will need these environment variables:\nep_uri <- Sys.getenv(\"AZ_STORAGE_EP\")\napp_id <- Sys.getenv(\"AZ_APP_ID\")\ncontainer_name <- Sys.getenv(\"AZ_STORAGE_CONTAINER\")\ntenant <- Sys.getenv(\"AZ_TENANT_ID\")"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#another-simple-example-1",
- "title": "Unit testing in R",
- "section": "Another (simple) example",
- "text": "Another (simple) example\n\nmy_new_function <- function(x, y) {\n if (x > y) {\n \"x\"\n } else {\n \"y\"\n }\n}\n\n\n\ntest_that(\"it returns 'x' if x is bigger than y\", {\n expect_equal(my_new_function(4, 3), \"x\")\n})\n\nTest passed 🎉\n\ntest_that(\"it returns 'y' if y is bigger than x\", {\n expect_equal(my_new_function(3, 4), \"y\")\n expect_equal(my_new_function(3, 3), \"y\")\n})\n\nTest passed 🥳"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#step-2-authenticate-with-azure",
+ "title": "Store Data Safely",
+ "section": "Step 2: Authenticate with Azure",
+ "text": "Step 2: Authenticate with Azure\n\n\ntoken <- AzureAuth::get_azure_token(\n \"https://storage.azure.com\",\n tenant = tenant,\n app = app_id,\n auth_type = \"device_code\",\n)\nThe first time you do this, you will have link to authenticate in your browser and a code in your terminal to enter. Use the browser that works best with your @mlcsu.nhs.uk account!"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#how-to-design-good-tests",
- "title": "Unit testing in R",
- "section": "How to design good tests",
- "text": "How to design good tests\na non-exhaustive list\n\nconsider all the functions arguments,\nwhat are the expected values for these arguments?\nwhat are unexpected values, and are they handled?\nare there edge cases that need to be handled?\nhave you covered all of the different paths in your code?\nhave you managed to create tests that check the range of results you expect?"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#step-3-connect-to-container",
+ "title": "Store Data Safely",
+ "section": "Step 3: Connect to container",
+ "text": "Step 3: Connect to container\nendpoint <- AzureStor::blob_endpoint(ep_uri, token = token)\ncontainer <- AzureStor::storage_container(endpoint, container_name)\n\n# List files in container\nblob_list <- AzureStor::list_blobs(container)\nIf you get 403 error, delete your token and re-authenticate, try a different browser/incognito, etc.\nTo clear Azure tokens: AzureAuth::clean_token_directory()"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#but-why-create-tests",
- "title": "Unit testing in R",
- "section": "But, why create tests?",
- "text": "But, why create tests?\nanother non-exhaustive list\n\ngood tests will help you uncover existing issues in your code\nwill defend you from future changes that break existing functionality\nwill alert you to changes in dependencies that may have changed the functionality of your code\ncan act as documentation for other developers"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container",
+ "title": "Store Data Safely",
+ "section": "Interact with the container",
+ "text": "Interact with the container\nIt’s possible to interact with the container via your browser!\nYou can upload and download files using the Graphical User Interface (GUI), login with your @mlcsu.nhs.uk account: https://portal.azure.com/#home\nAlthough it’s also cooler to interact via code… 😎"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#testing-complex-functions",
- "title": "Unit testing in R",
- "section": "Testing complex functions",
- "text": "Testing complex functions\n\n\n\nmy_big_function <- function(type) {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n df <- tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n\n conditions <- read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date) |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}\n\n\nWhere do you even begin to start writing tests for something so complex?\n\n\nNote: to get the code on the left to fit on one page, I skipped including a few library calls\n\nlibrary(tidyverse)\nlibrary(DBI)"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-1",
+ "title": "Store Data Safely",
+ "section": "Interact with the container",
+ "text": "Interact with the container\n# Upload contents of a local directory to container\nAzureStor::storage_multiupload(\n container,\n \"LOCAL_FOLDERNAME/*\",\n \"FOLDERNAME_ON_AZURE\"\n)\n\n# Upload specific file to container\nAzureStor::storage_upload(\n container,\n \"data/ronald.jpeg\",\n \"newdir/ronald.jpeg\"\n)"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions",
- "title": "Unit testing in R",
- "section": "Split the logic into smaller functions",
- "text": "Split the logic into smaller functions\nFunction to get the data from the database\n\nget_data_from_sql <- function() {\n con <- dbConnect(RSQLite::SQLite(), \"data.db\")\n tbl(con, \"data_table\") |>\n collect() |>\n mutate(across(date, lubridate::ymd))\n}"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#load-csv-files-directly-from-azure-container",
+ "title": "Store Data Safely",
+ "section": "Load csv files directly from Azure container",
+ "text": "Load csv files directly from Azure container\ndf_from_azure <- AzureStor::storage_read_csv(\n container,\n \"newdir/cats.csv\",\n show_col_types = FALSE\n)\n\n# Load file directly from Azure container (by storing it in memory)\n\nparquet_in_memory <- AzureStor::storage_download(\n container, src = \"newdir/cats.parquet\", dest = NULL\n)\n\nparq_df <- arrow::read_parquet(parquet_in_memory)"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-1",
- "title": "Unit testing in R",
- "section": "Split the logic into smaller functions",
- "text": "Split the logic into smaller functions\nFunction to get the relevant conditions\n\nget_conditions <- function(type) {\n read_csv(\n \"conditions.csv\", col_types = \"cc\"\n ) |>\n filter(condition_type == type)\n}"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#interact-with-the-container-2",
+ "title": "Store Data Safely",
+ "section": "Interact with the container",
+ "text": "Interact with the container\n# Delete from Azure container (!!!)\nAzureStor::delete_storage_file(container, BLOB_NAME)"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-2",
- "title": "Unit testing in R",
- "section": "Split the logic into smaller functions",
- "text": "Split the logic into smaller functions\nFunction to combine the data and create a count by date\n\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}"
+ "objectID": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve",
+ "href": "presentations/2024-05-16_store-data-safely/index.html#what-does-this-achieve",
+ "title": "Store Data Safely",
+ "section": "What does this achieve?",
+ "text": "What does this achieve?\n\nData is not in the repository, it is instead stored in a secure location\nCode can be open – sensitive information like Azure container name stored as environment variables\nLarge filesizes possible, other people can also access the same container.\nNaming conventions can help to keep blobs organised (these create pseudo-folders)\n\n\n\n\nLearn more about Data Science at The Strategy Unit"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-3",
- "title": "Unit testing in R",
- "section": "Split the logic into smaller functions",
- "text": "Split the logic into smaller functions\nFunction to generate a plot from the summarised data\n\ncreate_plot <- function(df) {\n df |>\n ggplot(aes(date, n)) +\n geom_line() +\n geom_point()\n}"
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#health-data-in-the-headlines",
+ "title": "System Dynamics in health and care",
+ "section": "Health Data in the Headlines",
+ "text": "Health Data in the Headlines\n\n\n\n\nUsed to seeing headlines that give a snapshot figure but doesn’t say much about the system.\nNow starting to see headlines that recognise flow through the system rather than snapshot in time of just one part.\nCan get better understanding of the issues in a system if we can map it as stocks and flows, but our datasets not designed to give up this information very readily. This talk is how I have tried to meet that challenge."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#split-the-logic-into-smaller-functions-4",
- "title": "Unit testing in R",
- "section": "Split the logic into smaller functions",
- "text": "Split the logic into smaller functions\nThe original function refactored to use the new functions\n\nmy_big_function <- function(type) {\n conditions <- get_conditions(type)\n\n get_data_from_sql() |>\n summarise_data(conditions) |>\n create_plot()\n}\n\n\nThis is going to be significantly easier to test, because we now can verify that the individual components work correctly, rather than having to consider all of the possibilities at once."
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#through-the-system-dynamics-lens",
+ "title": "System Dynamics in health and care",
+ "section": "Through the System Dynamics lens",
+ "text": "Through the System Dynamics lens\n\nStock-flow model\nDynamic behaviour, feedback loops\n\nIn a few seconds, what is SD?\nAn approach to understanding the behaviour of complex systems over time. A method of mapping a system as stocks, whose levels can only change due to flows in and flows out. Stocks could be people on a waiting list, on a ward, money, …\nFlows are the rate at which things change in a given time period e.g. admissions per day, referrals per month.\nBehaviour of the system is determined by how the components interact with each other, not what each component does. Mapping the structure of a system like this leads us to identify feedback loops, and consequences of an action - both intended and unintended.\nIn this capacity-constrained model we only need 3 parameters to run the model (exogenous). All the behaviour within the grey box is determined by the interactions of those components (indogenous).\nHow do we get a value/values for referrals per day?\n(currently use specialist software to build and run our models, aim is to get to a point where we can run in open source.)"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\nsummarise_data <- function(df, conditions) {\n df |>\n semi_join(conditions, by = \"condition\") |>\n count(date)\n}"
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-flows",
+ "title": "System Dynamics in health and care",
+ "section": "Determining flows",
+ "text": "Determining flows\n\n\n\n\n‘admissions per day’ is needed to populate the model.\n‘discharged’ could be used to verify the model against known data\n\nHow many admissions per day (or week, month…)\n\n\n\n\n\n\n\n \n\n\nGoing to use very simple model shown to explain how to extract flow data for admissions. Will start with visual explainer before going into the code.\n1. generate list of key dates (in this case daily, could be weekly, monthly)\n2. take our patient-level ID with admission and discharge dates\n3. count of admissions on that day/week"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-1",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\ntest_that(\"it summarises the data\", {\n # arrange\n \n\n\n\n\n\n\n \n\n \n # act\n \n # assert\n \n})"
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#determining-occupancy",
+ "title": "System Dynamics in health and care",
+ "section": "Determining occupancy",
+ "text": "Determining occupancy\n\n\n\n\n‘on ward’ is used to verify the model against known data\n\nLogic statement testing if the key date is wholly between admission and discharge dates\nflag for a match \n\n\n\n\n\n\n \n\n\nMight also want to generate occupancy, to compare the model output with actual data to verify/validate.\n1. generate list of key dates\n2. take our patient-level ID with admission and discharge dates\n3. going to take each date in our list of keydates, and see if there is an admission before that date and discharge after 4. this creates a wide data frame, the same length as patient data.\n5. once run through all the dates in the list, sum each column\nPatient A admitted on 2nd, so only starts being classed as resident on 3rd."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-2",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n \n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nGenerate some random data to build a reasonably sized data frame.\nYou could also create a table manually, but part of the trick of writing good tests for this function is to make it so the dates don’t all have the same count.\nThe reason for this is it’s harder to know for sure that the count worked if every row returns the same value.\nWe don’t need the values to be exactly like they are in the real data, just close enough. Instead of dates, we can use numbers, and instead of actual conditions, we can use letters."
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---flows",
+ "title": "System Dynamics in health and care",
+ "section": "in R - flows",
+ "text": "in R - flows\nEasy to do with count, or group_by and summarise\n\n\n admit_d <- spell_dates |> \n group_by(date_admit) |>\n count(date_admit)\n\nhead(admit_d)\n\n\n# A tibble: 6 × 2\n# Groups: date_admit [6]\n date_admit n\n <date> <int>\n1 2022-01-01 28\n2 2022-01-02 24\n3 2022-01-03 21\n4 2022-01-04 27\n5 2022-01-05 32\n6 2022-01-06 27"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-3",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n \n\n\n\n\n # act\n \n # assert\n \n})\n\nTests need to be reproducible, and generating our table at random will give us unpredictable results.\nSo, we need to set the random seed; now every time this test runs we will generate the same data."
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy",
+ "title": "System Dynamics in health and care",
+ "section": "in R - occupancy",
+ "text": "in R - occupancy\nGenerate list of key dates\n\n\n\ndate_start <- dmy(01012022) \ndate_end <- dmy(31012022)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"day\"))\n\nkeydates <- data.frame(\n date = c(seq(date_start, by = \"day\", length.out=run_len))) \n\n\n\n\n date\n1 2022-01-01\n2 2022-01-02\n3 2022-01-03\n4 2022-01-04\n5 2022-01-05\n6 2022-01-06\n\n\n\n\nStart by generating the list of keydates. In this example we’re running the model in days, and checking each day in 2022.\nNeed the run length for the next step, to know how many times to iterate over"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-4",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n\n\n\n # act\n \n # assert\n \n})\n\nCreate the conditions table. We don’t need all of the columns that are present in the real csv, just the ones that will make our code work.\nWe also need to test that the filtering join (semi_join) is working, so we want to use a subset of the conditions that were used in df."
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#in-r---occupancy-1",
+ "title": "System Dynamics in health and care",
+ "section": "in R - occupancy",
+ "text": "in R - occupancy\nIterate over each date - need to have been admitted before, and discharged after\n\noccupancy_flag <- function(df) {\n\n # pre-allocate tibble size to speed up iteration in loop\n activity_all <- tibble(nrow = nrow(df)) |> \n select()\n \n for (i in 1:run_len) {\n \n activity_period <- case_when(\n \n # creates 1 flag if resident for complete day\n df$date_admit < keydates$keydate[i] & \n df$date_discharge > keydates$keydate[i] ~ 1,\n TRUE ~ 0)\n \n # column bind this day's flags to previous\n activity_all <- bind_cols(activity_all, activity_period)\n \n }\n \n # rename column to match the day being counted\n activity_all <- activity_all |> \n setNames(paste0(\"d_\", keydates$date))\n \n # bind flags columns to patient data\n daily_adm <- bind_cols(df, activity_all) |> \n pivot_longer(\n cols = starts_with(\"d_\"),\n names_to = \"date\",\n values_to = \"count\"\n ) |> \n \n group_by(date) |> \n summarise(resident = sum(count)) |> \n ungroup() |> \n mutate(date = str_remove(date, \"d_\"))\n \n } \n\n\nIs there a better way than using a for loop?\n\nPre-allocate tibbles\nactivity_all will end up as very wide tibble, with a column for each date in list of keydates.\nFor each date in the list of key dates, compares with admission date & discharge date; need to be admitted before the key date and discharged after the key date. If match, flag = 1.\nCreates a column for each day, then binds this to activity all.\nRename each column with the date it was checking (add a character to start of column name so column doesn’t start with numeric)\nPivot long, then group by date and sum the flags (other variables could be added here, such as TFC or provider code)"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-5",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n \n \n\n \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nBecause we are generating df randomly, to figure out what our “expected” results are, I simply ran the code inside of the test to generate the “actual” results.\nGenerally, this isn’t a good idea. You are creating the results of your test from the code; ideally, you want to be thinking about what the results of your function should be.\nImagine your function doesn’t work as intended, there is some subtle bug that you are not yet aware of. By writing tests “backwards” you may write test cases that confirm the results, but not expose the bug. This is why it’s good to think about edge cases."
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---flows",
+ "title": "System Dynamics in health and care",
+ "section": "Longer Time Periods - flows",
+ "text": "Longer Time Periods - flows\nUse lubridate::floor_date to generate the date at start of week/month\n\nadmit_wk <- spell_dates |> \n mutate(week_start = floor_date(\n date_admit, unit = \"week\", week_start = 1 # start week on Monday\n )) |> \n count(week_start) # could add other parameters such as provider code, TFC etc\n\nhead(admit_wk)\n\n\n\n# A tibble: 6 × 2\n week_start n\n <date> <int>\n1 2021-12-27 52\n2 2022-01-03 196\n3 2022-01-10 192\n4 2022-01-17 223\n5 2022-01-24 157\n6 2022-01-31 187\n\n\n\nMight run SD model in weeks or months - e.g. months for care homes Use lubridate to create new variable with start date of week/month/year etc"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-6",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\")) \n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n ) \n # act\n actual <- summarise_data(df, conditions)\n # assert\n \n})\n\nThat said, in cases where we can be confident (say by static analysis of our code) that it is correct, building tests in this way will give us the confidence going forwards that future changes do not break existing functionality.\nIn this case, I have created the expected data frame using the results from running the function."
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods---occupancy",
+ "title": "System Dynamics in health and care",
+ "section": "Longer Time Periods - occupancy",
+ "text": "Longer Time Periods - occupancy\nKey dates to include the dates at the start and end of each time period\n\n\n\ndate_start <- dmy(03012022) # first Monday of the year\ndate_end <- dmy(01012023)\nrun_len <- length(seq(from = date_start, to = date_end, by = \"week\"))\n\nkeydates <- data.frame(wk_start = c(seq(date_start, \n by = \"week\", \n length.out=run_len))) |> \n mutate(\n wk_end = wk_start + 6) # last date in time period\n\n\n\n\n wk_start wk_end\n1 2022-01-03 2022-01-09\n2 2022-01-10 2022-01-16\n3 2022-01-17 2022-01-23\n4 2022-01-24 2022-01-30\n5 2022-01-31 2022-02-06\n6 2022-02-07 2022-02-13\n\n\n\n\nModel might make more sense to run in weeks or months (e.g. care home), so list of keydates need a start date and end date for each time period."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#lets-test-summarise_data-7",
- "title": "Unit testing in R",
- "section": "Let’s test summarise_data",
- "text": "Let’s test summarise_data\n\n\n\ntest_that(\"it summarises the data\", {\n # arrange\n set.seed(123)\n df <- tibble(\n date = sample(1:10, 300, TRUE),\n condition = sample(c(\"a\", \"b\", \"c\"), 300, TRUE)\n )\n conditions <- tibble(condition = c(\"a\", \"b\"))\n expected <- tibble(\n date = 1:10,\n n = c(19, 18, 12, 14, 17, 18, 24, 18, 31, 21)\n )\n # act\n actual <- summarise_data(df, conditions)\n # assert\n expect_equal(actual, expected)\n})\n\nTest passed 😸\n\n\n\nThe test works!"
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#longer-time-periods",
+ "title": "System Dynamics in health and care",
+ "section": "Longer Time Periods",
+ "text": "Longer Time Periods\nMore logic required if working in weeks or months - can only be in one place at any given time\n\n# flag for occupancy\nactivity_period <- case_when(\n \n # creates 1 flag if resident for complete week\n df$date_admit < keydates$wk_start[i] & df$date_discharge > keydates$wk_end[i] ~ 1,\n TRUE ~ 0)\n\n\nAnd a little bit more logic\nOccupancy requires the patient to have been admitted before the start of the week/month, and discharged after the end of the week/month"
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps",
- "title": "Unit testing in R",
- "section": "Next steps",
- "text": "Next steps\n\nYou can add tests to any R project (to test functions),\nBut {testthat} works best with Packages\nThe R Packages book has 3 chapters on testing\nThere are two useful helper functions in {usethis}\n\nuse_testthat() will set up the folders for test scripts\nuse_test() will create a test file for the currently open script"
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#applying-the-data",
+ "title": "System Dynamics in health and care",
+ "section": "Applying the data",
+ "text": "Applying the data\n\n\nHow to apply this wrangling of data to the system dynamic model?\nAdmissions data used as an input to the flow - could be reduced to a single figure (average), or there may be variation by season/day of week etc.\nOccupancy (and discharges) used to verify the model output against known data."
},
{
- "objectID": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1",
- "href": "presentations/2023-08-23_nhs-r_unit-testing/index.html#next-steps-1",
- "title": "Unit testing in R",
- "section": "Next steps",
- "text": "Next steps\n\nIf your test needs to temporarily create a file, or change some R-options, the {withr} package has a lot of useful functions that will automatically clean things up when the test finishes\nIf you are writing tests that involve calling out to a database, or you want to test my_big_function (from before) without calling the intermediate functions, then you should look at the {mockery} package"
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#next-steps",
+ "title": "System Dynamics in health and care",
+ "section": "Next Steps",
+ "text": "Next Steps\n\nGeneralise function to a state where it can be used by others - onto Github\nTurn this into a package\nOpen-source SD models and interfaces - R Shiny or Python"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#packages-we-are-using-today",
- "title": "Coffee and Coding",
- "section": "Packages we are using today",
- "text": "Packages we are using today\n\nlibrary(tidyverse)\n\nlibrary(sf)\n\nlibrary(tidygeocoder)\nlibrary(PostcodesioR)\n\nlibrary(osrm)\n\nlibrary(leaflet)"
+ "objectID": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions",
+ "href": "presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html#questions-comments-suggestions",
+ "title": "System Dynamics in health and care",
+ "section": "Questions, comments, suggestions?",
+ "text": "Questions, comments, suggestions?\n\n\n\nPlease get in touch!\n\nSally.Thompson37@nhs.net\n\n\n\nNHS-R conference 2023"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#getting-boundary-data",
- "title": "Coffee and Coding",
- "section": "Getting boundary data",
- "text": "Getting boundary data\nWe can use the ONS’s Geoportal we can grab boundary data to generate maps\n\n\n\nicb_url <- paste0(\n \"https://services1.arcgis.com\",\n \"/ESMARspQHYMw9BZ9/arcgis\",\n \"/rest/services\",\n \"/Integrated_Care_Boards_April_2023_EN_BGC\",\n \"/FeatureServer/0/query\",\n \"?outFields=*&where=1%3D1&f=geojson\"\n)\nicb_boundaries <- read_sf(icb_url)\n\nicb_boundaries |>\n ggplot() +\n geom_sf() +\n theme_void()"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-is-rap",
+ "title": "RAP",
+ "section": "What is RAP",
+ "text": "What is RAP\n\na process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\nRAP should be:\n\n\nthe core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\nGoldacre review"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-is-the-icb_boundaries-data",
- "title": "Coffee and Coding",
- "section": "What is the icb_boundaries data?",
- "text": "What is the icb_boundaries data?\n\nicb_boundaries |>\n select(ICB23CD, ICB23NM)\n\nSimple feature collection with 42 features and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -6.418667 ymin: 49.86479 xmax: 1.763706 ymax: 55.81112\nGeodetic CRS: WGS 84\n# A tibble: 42 × 3\n ICB23CD ICB23NM geometry\n <chr> <chr> <MULTIPOLYGON [°]>\n 1 E54000008 NHS Cheshire and Merseyside Integrated C… (((-3.083264 53.2559, -3…\n 2 E54000010 NHS Staffordshire and Stoke-on-Trent Int… (((-1.950489 53.21188, -…\n 3 E54000011 NHS Shropshire, Telford and Wrekin Integ… (((-2.380794 52.99841, -…\n 4 E54000013 NHS Lincolnshire Integrated Care Board (((0.2687853 52.81584, 0…\n 5 E54000015 NHS Leicester, Leicestershire and Rutlan… (((-0.7875237 52.97762, …\n 6 E54000018 NHS Coventry and Warwickshire Integrated… (((-1.577608 52.67858, -…\n 7 E54000019 NHS Herefordshire and Worcestershire Int… (((-2.272042 52.43972, -…\n 8 E54000022 NHS Norfolk and Waveney Integrated Care … (((1.666741 52.31366, 1.…\n 9 E54000023 NHS Suffolk and North East Essex Integra… (((0.8997023 51.7732, 0.…\n10 E54000024 NHS Bedfordshire, Luton and Milton Keyne… (((-0.4577115 52.32009, …\n# ℹ 32 more rows"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-we-trying-to-achieve",
+ "title": "RAP",
+ "section": "What are we trying to achieve?",
+ "text": "What are we trying to achieve?\n\nLegibility\nReproducibility\nAccuracy\nLaziness"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-dataframes",
- "title": "Coffee and Coding",
- "section": "Working with geospatial dataframes",
- "text": "Working with geospatial dataframes\nWe can simply join sf data frames and “regular” data frames together\n\n\n\nicb_metrics <- icb_boundaries |>\n st_drop_geometry() |>\n select(ICB23CD) |>\n mutate(admissions = rpois(n(), 1000000))\n\nicb_boundaries |>\n inner_join(icb_metrics, by = \"ICB23CD\") |>\n ggplot() +\n geom_sf(aes(fill = admissions)) +\n scale_fill_viridis_c() +\n theme_void()"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#what-are-some-of-the-fundamental-principles",
+ "title": "RAP",
+ "section": "What are some of the fundamental principles?",
+ "text": "What are some of the fundamental principles?\n\nPredictability, reducing mental load, and reducing truck factor\nMaking it easy to collaborate with yourself and others on different computers, in the cloud, in six months’ time…\nDRY"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames",
- "title": "Coffee and Coding",
- "section": "Working with geospatial data frames",
- "text": "Working with geospatial data frames\nWe can manipulate sf objects like other data frames\n\n\n\nlondon_icbs <- icb_boundaries |>\n filter(ICB23NM |> stringr::str_detect(\"London\"))\n\nggplot() +\n geom_sf(data = london_icbs) +\n geom_sf(data = st_centroid(london_icbs)) +\n theme_void()"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#the-road-to-rap",
+ "title": "RAP",
+ "section": "The road to RAP",
+ "text": "The road to RAP\n\nWe’re roughly using NHS Digital’s RAP stages\nThere is an incredibly large amount to learn!\nConfession time! (everything I do not know…)\nYou don’t need to do it all at once\nYou don’t need to do it all at all ever\nEach thing you learn will incrementally help you\nRemember- that’s why we learnt this stuff. Because it helped us. And it can help you too"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#working-with-geospatial-data-frames-1",
- "title": "Coffee and Coding",
- "section": "Working with geospatial data frames",
- "text": "Working with geospatial data frames\nSummarising the data will combine the geometries.\n\nlondon_icbs |>\n summarise(area = sum(Shape__Area)) |>\n # and use geospatial functions to create calculations using the geometry\n mutate(new_area = st_area(geometry), .before = \"geometry\")\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -0.5102803 ymin: 51.28676 xmax: 0.3340241 ymax: 51.69188\nGeodetic CRS: WGS 84\n# A tibble: 1 × 3\n area new_area geometry\n* <dbl> [m^2] <MULTIPOLYGON [°]>\n1 1573336388. 1567995610. (((-0.3314819 51.43935, -0.3306676 51.43889, -0.33118…\n\n\n Why the difference in area?\n\n We are using a simplified geometry, so calculating the area will be slightly inaccurate. The original area was calculated on the non-simplified geometries."
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--baseline",
+ "title": "RAP",
+ "section": "Levels of RAP- Baseline",
+ "text": "Levels of RAP- Baseline\n\nData produced by code in an open-source language (e.g., Python, R, SQL).\nCode is version controlled (see Git basics and using Git collaboratively guides).\nRepository includes a README.md file (or equivalent) that clearly details steps a user must follow to reproduce the code\nCode has been peer reviewed.\nCode is published in the open and linked to & from accompanying publication (if relevant).\n\nSource: NHS Digital RAP community of practice"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-our-own-geospatial-data",
- "title": "Coffee and Coding",
- "section": "Creating our own geospatial data",
- "text": "Creating our own geospatial data\n\nlocation_raw <- postcode_lookup(\"B2 4BJ\")\nglimpse(location_raw)\n\nRows: 1\nColumns: 40\n$ postcode <chr> \"B2 4BJ\"\n$ quality <int> 1\n$ eastings <int> 406866\n$ northings <int> 286775\n$ country <chr> \"England\"\n$ nhs_ha <chr> \"West Midlands\"\n$ longitude <dbl> -1.90033\n$ latitude <dbl> 52.47887\n$ european_electoral_region <chr> \"West Midlands\"\n$ primary_care_trust <chr> \"Heart of Birmingham Teaching\"\n$ region <chr> \"West Midlands\"\n$ lsoa <chr> \"Birmingham 138A\"\n$ msoa <chr> \"Birmingham 138\"\n$ incode <chr> \"4BJ\"\n$ outcode <chr> \"B2\"\n$ parliamentary_constituency <chr> \"Birmingham, Ladywood\"\n$ parliamentary_constituency_2024 <chr> \"Birmingham Ladywood\"\n$ admin_district <chr> \"Birmingham\"\n$ parish <chr> \"Birmingham, unparished area\"\n$ admin_county <lgl> NA\n$ date_of_introduction <chr> \"198001\"\n$ admin_ward <chr> \"Ladywood\"\n$ ced <lgl> NA\n$ ccg <chr> \"NHS Birmingham and Solihull\"\n$ nuts <chr> \"Birmingham\"\n$ pfa <chr> \"West Midlands\"\n$ admin_district_code <chr> \"E08000025\"\n$ admin_county_code <chr> \"E99999999\"\n$ admin_ward_code <chr> \"E05011151\"\n$ parish_code <chr> \"E43000250\"\n$ parliamentary_constituency_code <chr> \"E14000564\"\n$ parliamentary_constituency_2024_code <chr> \"E14001096\"\n$ ccg_code <chr> \"E38000258\"\n$ ccg_id_code <chr> \"15E\"\n$ ced_code <chr> \"E99999999\"\n$ nuts_code <chr> \"TLG31\"\n$ lsoa_code <chr> \"E01033620\"\n$ msoa_code <chr> \"E02006899\"\n$ lau2_code <chr> \"E08000025\"\n$ pfa_code <chr> \"E23000014\"\n\n\n\n\n\nlocation <- location_raw |>\n st_as_sf(coords = c(\"eastings\", \"northings\"), crs = 27700) |>\n select(postcode, ccg) |>\n st_transform(crs = 4326)\n\nlocation\n\nSimple feature collection with 1 feature and 2 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.900335 ymin: 52.47886 xmax: -1.900335 ymax: 52.47886\nGeodetic CRS: WGS 84\n postcode ccg geometry\n1 B2 4BJ NHS Birmingham and Solihull POINT (-1.900335 52.47886)"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--silver",
+ "title": "RAP",
+ "section": "Levels of RAP- Silver",
+ "text": "Levels of RAP- Silver\n\nCode is well-documented…\nCode is well-organised following standard directory format\nReusable functions and/or classes are used where appropriate\nPipeline includes a testing framework\nRepository includes dependency information (e.g. requirements.txt, PipFile, environment.yml\nData is handled and output in a Tidy data format\n\nSource: NHS Digital RAP community of practice"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#creating-a-geospatial-data-frame-for-all-nhs-trusts",
- "title": "Coffee and Coding",
- "section": "Creating a geospatial data frame for all NHS Trusts",
- "text": "Creating a geospatial data frame for all NHS Trusts\n\n\n\n# using the NHSRtools package\n# remotes::install_github(\"NHS-R-Community/NHSRtools\")\ntrusts <- ods_get_trusts() |>\n filter(status == \"Active\") |>\n select(name, org_id, post_code) |>\n geocode(postalcode = \"post_code\") |>\n st_as_sf(coords = c(\"long\", \"lat\"), crs = 4326)\n\n\ntrusts |>\n leaflet() |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers(popup = ~name)"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#levels-of-rap--gold",
+ "title": "RAP",
+ "section": "Levels of RAP- Gold",
+ "text": "Levels of RAP- Gold\n\nCode is fully packaged\nRepository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\nProcess runs based on event-based triggers (e.g., new data in database) or on a schedule\nChanges to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\nSource: NHS Digital RAP community of practice"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-are-the-nearest-trusts-to-our-location",
- "title": "Coffee and Coding",
- "section": "What are the nearest trusts to our location?",
- "text": "What are the nearest trusts to our location?\n\nnearest_trusts <- trusts |>\n mutate(\n distance = st_distance(geometry, location)[, 1]\n ) |>\n arrange(distance) |>\n head(5)\n\nnearest_trusts\n\nSimple feature collection with 5 features and 4 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -1.9384 ymin: 52.4533 xmax: -1.886282 ymax: 52.48764\nGeodetic CRS: WGS 84\n# A tibble: 5 × 5\n name org_id post_code geometry distance\n <chr> <chr> <chr> <POINT [°]> [m]\n1 BIRMINGHAM WOMEN'S AND CH… RQ3 B4 6NH (-1.894241 52.4849) 789.\n2 BIRMINGHAM AND SOLIHULL M… RXT B1 3RB (-1.917663 52.48416) 1313.\n3 BIRMINGHAM COMMUNITY HEAL… RYW B7 4BN (-1.886282 52.48754) 1356.\n4 SANDWELL AND WEST BIRMING… RXK B18 7QH (-1.930203 52.48764) 2246.\n5 UNIVERSITY HOSPITALS BIRM… RRK B15 2GW (-1.9384 52.4533) 3838."
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#a-learning-journey-to-get-you-there",
+ "title": "RAP",
+ "section": "A learning journey to get you there",
+ "text": "A learning journey to get you there\n\nCode style, organising your files\nFunctions and iteration\nGit and GitHub\nPackaging your code\nTesting\nPackage management and versioning"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-find-driving-routes-to-these-trusts",
- "title": "Coffee and Coding",
- "section": "Let’s find driving routes to these trusts",
- "text": "Let’s find driving routes to these trusts\n\nroutes <- nearest_trusts |>\n mutate(\n route = map(geometry, ~ osrmRoute(location, st_coordinates(.x)))\n ) |>\n st_drop_geometry() |>\n rename(straight_line_distance = distance) |>\n unnest(route) |>\n st_as_sf()\n\nroutes\n\nSimple feature collection with 5 features and 8 fields\nGeometry type: LINESTRING\nDimension: XY\nBounding box: xmin: -1.93846 ymin: 52.45316 xmax: -1.88527 ymax: 52.49279\nGeodetic CRS: WGS 84\n# A tibble: 5 × 9\n name org_id post_code straight_line_distance src dst duration distance\n <chr> <chr> <chr> [m] <chr> <chr> <dbl> <dbl>\n1 BIRMING… RQ3 B4 6NH 789. 1 dst 5.77 3.09\n2 BIRMING… RXT B1 3RB 1313. 1 dst 6.84 4.14\n3 BIRMING… RYW B7 4BN 1356. 1 dst 7.59 4.29\n4 SANDWEL… RXK B18 7QH 2246. 1 dst 8.78 4.95\n5 UNIVERS… RRK B15 2GW 3838. 1 dst 10.6 4.67\n# ℹ 1 more variable: geometry <LINESTRING [°]>"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#how-we-can-help-each-other-get-there",
+ "title": "RAP",
+ "section": "How we can help each other get there",
+ "text": "How we can help each other get there\n\nWork as a team!\nCoffee and coding!\nAsk for help!\nDo pair coding!\nGet your code reviewed!\nJoin the NHS-R/ NHSPycom communities"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#lets-show-the-routes",
- "title": "Coffee and Coding",
- "section": "Let’s show the routes",
- "text": "Let’s show the routes\n\nleaflet(routes) |>\n addTiles() |>\n addMarkers(data = location) |>\n addPolylines(color = \"black\", weight = 3, opacity = 1) |>\n addCircleMarkers(data = nearest_trusts, radius = 4, opacity = 1, fillOpacity = 1)"
+ "objectID": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca",
+ "href": "presentations/2023-03-09_midlands-analyst-rap/index.html#haca",
+ "title": "RAP",
+ "section": "HACA",
+ "text": "HACA\n\nThe first national analytics conference for health and care\nInsight to action!\nJuly 11th and 12th, University of Birmingham\nAccepting abstracts for short and long talks and posters\nAbstract deadline 27th March\nHelp is available (with abstract, poster, preparing presentation…)!\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#we-can-use-osrm-to-calculate-isochrones",
- "title": "Coffee and Coding",
- "section": "We can use {osrm} to calculate isochrones",
- "text": "We can use {osrm} to calculate isochrones\n\n\n\niso <- osrmIsochrone(location, breaks = seq(0, 60, 15), res = 10)\n\nisochrone_ids <- unique(iso$id)\n\npal <- colorFactor(\n viridis::viridis(length(isochrone_ids)),\n isochrone_ids\n)\n\nleaflet(location) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~ pal(id),\n color = \"#000000\",\n weight = 1\n )"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#a-note-on-richard-stallman",
+ "title": "Open source licensing",
+ "section": "A note on Richard Stallman",
+ "text": "A note on Richard Stallman\n\nRichard Stallman has been heavily criticised for some of this views\nHe is hard to ignore when talking about open source so I am going to talk about him\nNothing in this talk should be read as endorsing any of his comments outside (or inside) the world of open source"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones",
- "title": "Coffee and Coding",
- "section": "What trusts are in the isochrones?",
- "text": "What trusts are in the isochrones?\nThe summarise() function will “union” the geometry\n\nsummarise(iso)\n\nSimple feature collection with 1 feature and 0 fields\nGeometry type: POLYGON\nDimension: XY\nBounding box: xmin: -2.913575 ymin: 51.98062 xmax: -0.8502164 ymax: 53.1084\nGeodetic CRS: WGS 84\n geometry\n1 POLYGON ((-1.541014 52.9693..."
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#the-origin-of-open-source",
+ "title": "Open source licensing",
+ "section": "The origin of open source",
+ "text": "The origin of open source\n\nIn the 50s and 60s source code was routinely shared with hardware and users were often expected to modify to run on their hardware\nBy the late 1960s the production cost of software was rising relative to hardware and proprietary licences became more prevalent\nIn 1980 Richard Stallman’s department at MIT took delivery of a printer they were not able to modify the source code for\nRichard Stallman launched the GNU project in 1983 to fight for software freedoms\nMIT licence was launched in the late 1980s\nCathedral and the bazaar was released in 1997 (more on which later)"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-1",
- "title": "Coffee and Coding",
- "section": "What trusts are in the isochrones?",
- "text": "What trusts are in the isochrones?\nWe can use this with a geo-filter to find the trusts in the isochrone\n\n# also works\ntrusts_in_iso <- trusts |>\n st_filter(\n summarise(iso),\n .predicate = st_within\n )\n\ntrusts_in_iso\n\nSimple feature collection with 31 features and 3 fields\nGeometry type: POINT\nDimension: XY\nBounding box: xmin: -2.793386 ymin: 52.19205 xmax: -1.10302 ymax: 53.01015\nGeodetic CRS: WGS 84\n# A tibble: 31 × 4\n name org_id post_code geometry\n * <chr> <chr> <chr> <POINT [°]>\n 1 BIRMINGHAM AND SOLIHULL MENTAL HE… RXT B1 3RB (-1.917663 52.48416)\n 2 BIRMINGHAM COMMUNITY HEALTHCARE N… RYW B7 4BN (-1.886282 52.48754)\n 3 BIRMINGHAM WOMEN'S AND CHILDREN'S… RQ3 B4 6NH (-1.894241 52.4849)\n 4 BIRMINGHAM WOMEN'S NHS FOUNDATION… RLU B15 2TG (-1.942861 52.45325)\n 5 BURTON HOSPITALS NHS FOUNDATION T… RJF DE13 0RB (-1.656667 52.81774)\n 6 COVENTRY AND WARWICKSHIRE PARTNER… RYG CV6 6NY (-1.48692 52.45659)\n 7 DERBYSHIRE HEALTHCARE NHS FOUNDAT… RXM DE22 3LZ (-1.512896 52.91831)\n 8 DUDLEY INTEGRATED HEALTH AND CARE… RYK DY5 1RU (-2.11786 52.48176)\n 9 GEORGE ELIOT HOSPITAL NHS TRUST RLT CV10 7DJ (-1.47844 52.51258)\n10 HEART OF ENGLAND NHS FOUNDATION T… RR1 B9 5ST (-1.828759 52.4781)\n# ℹ 21 more rows"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-open-source",
+ "title": "Open source licensing",
+ "section": "What is open source?",
+ "text": "What is open source?\n\nThink free as in free speech, not free beer (Stallman)\n\n\nOpen source does not mean free of charge! Software freedom implies the ability to sell code\nFree of charge does not mean open source! Many free to download pieces of software are not open source (Zoom, for example)\n\n\nBy Chao-Kuei et al. - https://www.gnu.org/philosophy/categories.en.html, GPL, Link"
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#what-trusts-are-in-the-isochrones-2",
- "title": "Coffee and Coding",
- "section": "What trusts are in the isochrones?",
- "text": "What trusts are in the isochrones?\n\n\n\nleaflet(trusts_in_iso) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = iso,\n fillColor = ~pal(id),\n color = \"#000000\",\n weight = 1\n )"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#the-four-freedoms",
+ "title": "Open source licensing",
+ "section": "The four freedoms",
+ "text": "The four freedoms\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 1: The freedom to study how the program works, and change it to make it do what you wish.\nFreedom 2: The freedom to redistribute and make copies so you can help your neighbor.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits."
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#doing-the-same-but-within-a-radius",
- "title": "Coffee and Coding",
- "section": "Doing the same but within a radius",
- "text": "Doing the same but within a radius\n\n\n\nr <- 25000\n\ntrusts_in_radius <- trusts |>\n st_filter(\n location,\n .predicate = st_is_within_distance,\n dist = r\n )\n\n# transforming gives us a pretty smooth circle\nradius <- location |>\n st_transform(crs = 27700) |>\n st_buffer(dist = r) |>\n st_transform(crs = 4326)\n\nleaflet(trusts_in_radius) |>\n addProviderTiles(\"Stamen.TonerLite\") |>\n addMarkers() |>\n addPolygons(\n data = radius,\n color = \"#000000\",\n weight = 1\n )"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar",
+ "title": "Open source licensing",
+ "section": "Cathedral and the bazaar",
+ "text": "Cathedral and the bazaar\n\nEvery good work of software starts by scratching a developer’s personal itch.\nGood programmers know what to write. Great ones know what to rewrite (and reuse).\nPlan to throw one [version] away; you will, anyhow (copied from Frederick Brooks’s The Mythical Man-Month).\nIf you have the right attitude, interesting problems will find you.\nWhen you lose interest in a program, your last duty to it is to hand it off to a competent successor.\nTreating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.\nRelease early. Release often. And listen to your customers.\nGiven a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix obvious to someone.\nSmart data structures and dumb code works a lot better than the other way around.\nIf you treat your beta-testers as if they’re your most valuable resource, they will respond by becoming your most valuable resource."
},
{
- "objectID": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading",
- "href": "presentations/2023-08-24_coffee-and-coding_geospatial/index.html#further-reading",
- "title": "Coffee and Coding",
- "section": "Further reading",
- "text": "Further reading\n\nGeocomputation with R\nr-spatial\n{sf} documentation\nLeaflet documentation\nTidy Geospatial Networks in R\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#cathedral-and-the-bazaar-cont.",
+ "title": "Open source licensing",
+ "section": "Cathedral and the bazaar (cont.)",
+ "text": "Cathedral and the bazaar (cont.)\n\nThe next best thing to having good ideas is recognizing good ideas from your users. Sometimes the latter is better.\nOften, the most striking and innovative solutions come from realizing that your concept of the problem was wrong.\nPerfection (in design) is achieved not when there is nothing more to add, but rather when there is nothing more to take away. (Attributed to Antoine de Saint-Exupéry)\nAny tool should be useful in the expected way, but a truly great tool lends itself to uses you never expected.\nWhen writing gateway software of any kind, take pains to disturb the data stream as little as possible—and never throw away information unless the recipient forces you to!\nWhen your language is nowhere near Turing-complete, syntactic sugar can be your friend.\nA security system is only as secure as its secret. Beware of pseudo-secrets.\nTo solve an interesting problem, start by finding a problem that is interesting to you.\nProvided the development coordinator has a communications medium at least as good as the Internet, and knows how to lead without coercion, many heads are inevitably better than one."
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#how-did-we-get-here",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#how-did-we-get-here",
- "title": "Agile and scrum working",
- "section": "How did we get here?",
- "text": "How did we get here?\n\nWaterfall approaches were used in the early days of software development\n\nRequirements; Design; Development; Integration; Testing; Deployment\n\nYou only move to the next stage when the first one is complete\n(although actually it turns out you kind of don’t…)"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#the-disciplines-of-open-source-are-the-disciplines-of-good-data-science",
+ "title": "Open source licensing",
+ "section": "The disciplines of open source are the disciplines of good data science",
+ "text": "The disciplines of open source are the disciplines of good data science\n\nMeaningful README\nMeaningful commit messages\nModularity\nSeparating data code from analytic code from interactive code\nAssigning issues and pull requests for action/ review\nDon’t forget one of the most lazy and incompetent developers you will ever work with is yourself, six months later"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-road-to-agile",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-road-to-agile",
- "title": "Agile and scrum working",
- "section": "The road to agile",
- "text": "The road to agile\n\nSome of the ideas for agile floated around in the 20th century\nShewart’s Plan-Do-Study-Act cycle\nThe New New Product Development Game in 1986\nScrum (which we’ll return to) was proposed in 1993\nIn 2001 the Manifesto for Agile Software Development was published"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#what-licences-exist",
+ "title": "Open source licensing",
+ "section": "What licences exist?",
+ "text": "What licences exist?\n\nPermissive\n\nSuch as MIT but there are others. Recommended by NHSX draft guidelines on open source\nApache is a notable permissive licence- includes a patent licence\nIn our work the OGL is also relevant- civil servant publish stuff under OGL (and MIT- it isn’t particularly recommended for code)\n\nCopyleft\n\nGPL2, GPL3, AGPL (“the GPL of the web”)\nNote that the provisions of the GPL only apply when you distribute the code\nAt a certain point it all gets too complicated and you need a lawyer\nMPL is a notable copyleft licence- can combine with proprietary code as long as kept separate\n\nArguments for permissive/ copyleft- getting your code used versus preserving software freedoms for other people\nNote that most of the licences are impossible to read! There is a website to explain tl;dr"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-manifesto",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-manifesto",
- "title": "Agile and scrum working",
- "section": "The agile manifesto",
- "text": "The agile manifesto\n\nCopyright © 2001 Kent Beck, Mike Beedle, Arie van Bennekum, Alistair Cockburn, Ward Cunningham, Martin Fowler, James Grenning, Jim Highsmith, Andrew Hunt, Ron Jeffries, Jon Kern, Brian Marick\nRobert C. Martin, Steve Mellor, Ken Schwaber, Jeff Sutherland, Dave Thomas\nthis declaration may be freely copied in any form, but only in its entirety through this notice."
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#what-is-copyright-and-why-does-it-matter",
+ "title": "Open source licensing",
+ "section": "What is copyright and why does it matter",
+ "text": "What is copyright and why does it matter\n\nCopyright is assigned at the moment of creation\nIf you made it in your own time, it’s yours (usually!)\nIf you made it at work, it belongs to your employer\nIf someone paid you to make it (“work for hire”) it belongs to them\nCrucially, the copyright holder can relicence software\n\nIf it’s jointly authored it depends if it’s a “collective” or “joint” work\nHonestly it’s pretty complicated. Just vest copyright in an organisation or group of individuals you trust\nGoldacre review suggests using Crown copyright for copyright in the NHS because it’s a “shoal, not a big fish” (with apologies to Ben whom I am misquoting)"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--software-and-the-mvp",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--software-and-the-mvp",
- "title": "Agile and scrum working",
- "section": "Agile principles- software and the MVP",
- "text": "Agile principles- software and the MVP\n\nOur highest priority is to satisfy the customer through early and continuous delivery of valuable software.\nDeliver working software frequently, from a couple of weeks to a couple of months, with a preference to the shorter timescale.\nWorking software is the primary measure of progress.\n\n(these principles and those on following slides copyright Ibid.)"
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#iceweasel",
+ "title": "Open source licensing",
+ "section": "Iceweasel",
+ "text": "Iceweasel\n\nIceweasel is a story of trademark rather than copyright\nDebian (a Linux flavour) had the permission to use the source code of Firefox, but not the logo\nSo they took the source code and made their own version\nThis sounds very obscure and unimportant but it could become important in future projects of ours, like…"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--working-with-customers",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--working-with-customers",
- "title": "Agile and scrum working",
- "section": "Agile principles- working with customers",
- "text": "Agile principles- working with customers\n\nWelcome changing requirements, even late in development. Agile processes harness change for the customer’s competitive advantage.\nBusiness people and developers must work together daily throughout the project."
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#what-we-have-learned-in-recent-projects",
+ "title": "Open source licensing",
+ "section": "What we have learned in recent projects",
+ "text": "What we have learned in recent projects\n\nThe huge benefits of being open\n\nTransparency\nWorking with customers\nGoodwill\n\nNonfree mitigators\nDifferent licences for different repos"
},
- {
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--teamwork",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--teamwork",
- "title": "Agile and scrum working",
- "section": "Agile principles- teamwork",
- "text": "Agile principles- teamwork\n\nBuild projects around motivated individuals. Give them the environment and support they need, and trust them to get the job done.\nThe most efficient and effective method of conveying information to and within a development team is face-to-face conversation.\nThe best architectures, requirements, and designs emerge from self-organizing teams.\nAt regular intervals, the team reflects on how to become more effective, then tunes and adjusts its behavior accordingly."
+ {
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#software-freedom-means-allowing-people-to-do-stuff-you-dont-like",
+ "title": "Open source licensing",
+ "section": "Software freedom means allowing people to do stuff you don’t like",
+ "text": "Software freedom means allowing people to do stuff you don’t like\n\nFreedom 0: The freedom to use the program for any purpose.\nFreedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.\nThe code isn’t the only thing with worth in the project\nThis is why there are whole businesses founded on “here’s the Linux source code”\nSo when we’re sharing code we are letting people do stupid things with it but we’re not recommending that they do stupid things with it\nPeople do stupid things with Excel and Microsoft don’t accept liability for that, and neither should we\nThis issue of sharing analytic code and merchantability for a particular purpose is poorly understood and I think everyone needs to be clearer on it (us, and our customers)\nIn my view a world where consultants are selling our code is better than a world where they’re selling their spreadsheets"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--project-management",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-principles--project-management",
- "title": "Agile and scrum working",
- "section": "Agile principles- project management",
- "text": "Agile principles- project management\n\nAgile processes promote sustainable development. The sponsors, developers, and users should be able to maintain a constant pace indefinitely.\nContinuous attention to technical excellence and good design enhances agility.\nSimplicity–the art of maximizing the amount of work not done–is essential."
+ "objectID": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano",
+ "href": "presentations/2024-05-30_open-source-licensing/index.html#open-source-as-in-piano",
+ "title": "Open source licensing",
+ "section": "“Open source as in piano”",
+ "text": "“Open source as in piano”\n\nThe patient experience QDC project\nOur current project\nOpen source code is not necessarily to be run, but understood and learned from\nBuilding a group of people who can use and contribute to your code is arguably as important as writing it\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-advantage",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-agile-advantage",
- "title": "Agile and scrum working",
- "section": "The agile advantage",
- "text": "The agile advantage\n\nBetter use of fixed resources to deliver an unknown outcome, rather than unknown resources to deliver a fixed outcome\nContinuous delivery"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-data-science",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-data-science",
+ "title": "Travels with R and Python",
+ "section": "What is data science?",
+ "text": "What is data science?\n\n“A data scientist knows more about computer science than the average statistician, and more about statistics than the average computer scientist”\n\n(Josh Wills, a former head of data engineering at Slack)"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#feature-creep",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#feature-creep",
- "title": "Agile and scrum working",
- "section": "Feature creep",
- "text": "Feature creep\n\nUsers ask for: everything they need, everything they think they may need, everything they want, everything they think they may want\n\n“every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can”\n\nZawinski’s Law- Source"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#drew-conways-famous-venn-diagram",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#drew-conways-famous-venn-diagram",
+ "title": "Travels with R and Python",
+ "section": "Drew Conway’s famous Venn diagram",
+ "text": "Drew Conway’s famous Venn diagram\n\nSource"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#regular-stakeholder-feedback",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#regular-stakeholder-feedback",
- "title": "Agile and scrum working",
- "section": "Regular stakeholder feedback",
- "text": "Regular stakeholder feedback\n\nAgile teams are very responsive to product feedback\nThe project we’re curently working on is very agile whether we like it or not\nOur customers never know what they want until we show them something they don’t want"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science",
+ "title": "Travels with R and Python",
+ "section": "What are the skills of data science?",
+ "text": "What are the skills of data science?\n\nAnalysis\n\nML\nStats\nData viz\n\nSoftware engineering\n\nProgramming\nSQL/ data\nDevOps\nRAP"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#more-agile-advantages",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#more-agile-advantages",
- "title": "Agile and scrum working",
- "section": "More agile advantages",
- "text": "More agile advantages\n\nEarly and cheap failure\nContinuous testing and QA\nReduction in unproductive work\nTeam can improve regularly, not just the product"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science-1",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-are-the-skills-of-data-science-1",
+ "title": "Travels with R and Python",
+ "section": "What are the skills of data science?",
+ "text": "What are the skills of data science?\n\nDomain knowledge\n\nCommunication\nProblem formulation\nDashboards and reports"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#agile-methods",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#agile-methods",
- "title": "Agile and scrum working",
- "section": "Agile methods",
- "text": "Agile methods\n\nThere are lots of agile methodologies\nI’m not going to embarrass myself by pretending to understand them\nExamples include Lean, Crystal, and Extreme Programming"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#stats-and-data-viz",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#stats-and-data-viz",
+ "title": "Travels with R and Python",
+ "section": "Stats and data viz",
+ "text": "Stats and data viz\n\nML leans a bit more towards atheoretical prediction\nStats leans a bit more towards inference (but they both do both)\nData scientists may use different visualisations\n\nInteractive web based tools\nDashboard based visualisers e.g. {stminsights}"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum",
- "title": "Agile and scrum working",
- "section": "Scrum",
- "text": "Scrum\n\nScrum is the agile methodology we have adopted\nDespite dire warnings to the contrary we have not adopted it wholesale but most of its principles\nThe fundamental organising principle of work in scrum is a sprint lasting 1-4 weeks\nEach sprint finishes with a defined and useful piece of software that can be shown to/ used by customers"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#software-engineering",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#software-engineering",
+ "title": "Travels with R and Python",
+ "section": "Software engineering",
+ "text": "Software engineering\n\nProgramming\n\nNo/ low code data science?\n\nSQL/ data\n\nTend to use reproducible automated processes\n\nDevOps\n\nPlan, code, build, test, release, deploy, operate, monitor\n\nRAP\n\nI will come back to this"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#product-owner",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#product-owner",
- "title": "Agile and scrum working",
- "section": "Product owner",
- "text": "Product owner\n\nThis person is responsible for the backlog- what goes in to the sprint\nThe backlog should be inclusive of all of the things that customers want or might want\nThe backlog should be prioritised\nThe product owner does this through deep and frequent conversations with customers"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#domain-knowledge",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#domain-knowledge",
+ "title": "Travels with R and Python",
+ "section": "Domain knowledge",
+ "text": "Domain knowledge\n\nDo stuff that matters\n\nThe best minds of my generation are thinking about how to make people click ads. That sucks. Jeffrey Hammerbacher\n\nConvince other people that it matters\nThis is the hardest part of data science"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-master-helps-the-scrum-team",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-master-helps-the-scrum-team",
- "title": "Agile and scrum working",
- "section": "Scrum master helps the scrum team",
- "text": "Scrum master helps the scrum team\n\n“By coaching the team members in self-management and cross-functionality\nFocus on creating high-value Increments that meet the Definition of Done\nInfluence the removal of impediments to the Scrum Team’s progress\nEnsure that all Scrum events take place and are positive, productive, and kept within the timebox.”\n\nSource"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#rap",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#rap",
+ "title": "Travels with R and Python",
+ "section": "RAP",
+ "text": "RAP\n\nData science isn’t RAP\nRAP isn’t data science\nThey are firm friends"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#the-backlog",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#the-backlog",
- "title": "Agile and scrum working",
- "section": "The backlog",
- "text": "The backlog\n\nHaving an accurate and well prioritised backlog is key\nDon’t estimate the backlog in hours- use “T shirt sizes” or “points”\nPeople are terrible at estimating how long things take- particularly in software\nEverything in the backlog needs a defined “Done” state"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#reproducibility",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#reproducibility",
+ "title": "Travels with R and Python",
+ "section": "Reproducibility",
+ "text": "Reproducibility\n\nReproducibility in science\nThe $6B spreadsheet error\nGeorge Osbourne’s austerity was based on a spreadsheet error\nFor us, reproducibility also means we can do the same analysis 50 times in one minute\n\nWhich is why I started down the road of data science"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-planning",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-planning",
- "title": "Agile and scrum working",
- "section": "Sprint planning",
- "text": "Sprint planning\n\nThe team, the product owner, and the scrum master plan the sprint\nSprints should be a fixed length of time less than one month\nThe sprint cannot be changed or added to (we break this rule)\nThe team works autonomously in the sprint- nobody decides who does what except the team\nCan take three hours and should if it needs to"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-rap",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#what-is-rap",
+ "title": "Travels with R and Python",
+ "section": "What is RAP",
+ "text": "What is RAP\n\na process in which code is used to minimise manual, undocumented steps, and a clear, properly documented process is produced in code which can reliably give the same result from the same dataset\nRAP should be:\n\n\nthe core working practice that must be supported by all platforms and teams; make this a core focus of NHS analyst training\n\n\nGoldacre review"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#standup",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#standup",
- "title": "Agile and scrum working",
- "section": "Standup",
- "text": "Standup\n\nEvery day, for no more than 15 minutes (teams often stand up to reinforce this rule) team and scrum master meet\nEach person answers three questions\n\nWhat did you do yesterday to help the team finish the sprint?\nWhat will you do today to help the team finish the sprint?\nIs there an obstacle blocking you or the team from achieveing the sprint goal"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--baseline",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--baseline",
+ "title": "Travels with R and Python",
+ "section": "Levels of RAP- Baseline",
+ "text": "Levels of RAP- Baseline\n\nData produced by code in an open-source language (e.g., Python, R, SQL)\nCode is version controlled\nRepository includes a README.md file that clearly details steps a user must follow to reproduce the code\nCode has been peer reviewed\nCode is published in the open and linked to & from accompanying publication (if relevant)\n\n\nSource: NHS Digital RAP community of practice"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-retro",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#sprint-retro",
- "title": "Agile and scrum working",
- "section": "Sprint retro",
- "text": "Sprint retro\n\nWhat went well, what could have gone better, and what to improve next time\nLooking at process, not blaming individuals\nRequires maturity and trust to bring up issues, and to respond to them in a constructive way\nShould agree at the end on one process improvement which goes in the next sprint\nWe’ve had some really, really good retros and I think it’s a really important process for a team"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--silver",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--silver",
+ "title": "Travels with R and Python",
+ "section": "Levels of RAP- Silver",
+ "text": "Levels of RAP- Silver\n\nCode is well-documented…\nCode is well-organised following standard directory format\nReusable functions and/or classes are used where appropriate\nPipeline includes a testing framework\nRepository includes dependency information (e.g. requirements.txt, PipFile, environment.yml)\nData is handled and output in a Tidy data format\n\n\nSource: NHS Digital RAP community of practice"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#team-perspective",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#team-perspective",
- "title": "Agile and scrum working",
- "section": "Team perspective",
- "text": "Team perspective\n\nProduct owner- that’s me\n\nFocus, clarity and transparency, team delivery, clear and appropriate responsibilities\n\nScrum master- YiWen\nTeam member- Matt\nTeam member- Rhian"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--gold",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#levels-of-rap--gold",
+ "title": "Travels with R and Python",
+ "section": "Levels of RAP- Gold",
+ "text": "Levels of RAP- Gold\n\nCode is fully packaged\nRepository automatically runs tests etc. via CI/CD or a different integration/deployment tool e.g. GitHub Actions\nProcess runs based on event-based triggers (e.g., new data in database) or on a schedule\nChanges to the RAP are clearly signposted. E.g. a changelog in the package, releases etc. (See gov.uk info on Semantic Versioning)\n\n\nSource: NHS Digital RAP community of practice"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-values",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#scrum-values",
- "title": "Agile and scrum working",
- "section": "Scrum values",
- "text": "Scrum values\n\nCourage\nFocus\nCommitment\nRespect\nOpenness"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#data-science-in-healthcare",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#data-science-in-healthcare",
+ "title": "Travels with R and Python",
+ "section": "Data science in healthcare",
+ "text": "Data science in healthcare\n\nForecasting\n\nStats versus ML\n\nText mining\n\nR versus Python\n\nDemand modelling\n\nDevOps as a way of life"
},
{
- "objectID": "presentations/2024-08-22_agile-and-scrum/index.html#using-agile-outside-of-software",
- "href": "presentations/2024-08-22_agile-and-scrum/index.html#using-agile-outside-of-software",
- "title": "Agile and scrum working",
- "section": "Using agile outside of software",
- "text": "Using agile outside of software\n\nData science is outside of software (IMHO)\n\nWe don’t have daily standups and some of our processes run longer than in software development\n\nYou can build cars with Agile\nMarketing and UX design\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#get-involved",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#get-involved",
+ "title": "Travels with R and Python",
+ "section": "Get involved!",
+ "text": "Get involved!\n\nNHS-R community\n\nWebinars, training, conference, Slack\n\nNHS Pycom\n\nditto…\n\nMLCSU GitHub?\nBuild links with the other CSUs"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#targets-for-analysts",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#targets-for-analysts",
- "title": "Coffee and Coding",
- "section": "{targets} for analysts",
- "text": "{targets} for analysts\n\n\n\nTom previously presented about {targets} at a coffee and coding last March and you can revisit his presentation and learn about the reasons why you should use the package to manage your pipeline and see a simple demonstration of how to use the package.\nMatt has presented previously about {targets} and making your workflows (pipelines) reproducible.\nSo….. if you aren’t really even sure why your pipeline needs managing as an analyst or whether you actually have one (you do) then links to their presentations are at the end"
+ "objectID": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#contact",
+ "href": "presentations/2023-08-02_mlcsu-ksn-meeting/index.html#contact",
+ "title": "Travels with R and Python",
+ "section": "Contact",
+ "text": "Contact\n\n\n\n\n strategy.unit@nhs.net\n The-Strategy-Unit\n\n\n\n\n\n chris.beeley1@nhs.net\n chrisbeeley\n\n\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#aims",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#aims",
- "title": "Coffee and Coding",
- "section": "Aims",
- "text": "Aims\n\nIn this presentation we aim to demonstrate the real-world use of {targets} in an analysis project, but first a brief explanation\n\n\n\nWithout {targets} we\n\n\nWrite a script\nExecute script\nMake changes\nGo to step 2\n\n\n\nWith {targets} we will\n\n\nlearn how the various stages of our analysis fit together\nsave time by only running necessary stages as we cycle through the process\nhelp future you and colleagues re-visiting the analysis - Matt says “its like a time-capsule”\nmake Reproducible Analytical Pipelines\n\n\nsource: The {targets} R package user manual"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#section",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#section",
+ "title": "Conference Check-in App",
+ "section": "",
+ "text": "digital.library.unt.edu/ark:/67531/metadc1039451/m1/1/\n\n\nClark, Junebug. [Registration Desk for the LPC Conference], photograph, 2016-03-17/2016-03-19; (https://digital.library.unt.edu/ark:/67531/metadc1039451/m1/1/: accessed October 16, 2023), University of North Texas Libraries, UNT Digital Library, https://digital.library.unt.edu; crediting UNT Libraries Special Collections."
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#explain-the-live-project",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#explain-the-live-project",
- "title": "Coffee and Coding",
- "section": "Explain the live project",
- "text": "Explain the live project\n\noriginal project had 30+ metrics\nmultiple inter-related processing steps\neach time a metric changed or a process was altered it impacted across the project\nthere was potential for mistakes, duplication, lots of wasted time\nusing targets provides a structure that handles these inter-relationships"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#qr-codes-are-great",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#qr-codes-are-great",
+ "title": "Conference Check-in App",
+ "section": "QR codes are great",
+ "text": "QR codes are great"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#how-targets-can-help",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#how-targets-can-help",
- "title": "Coffee and Coding",
- "section": "How {targets} can help",
- "text": "How {targets} can help\n\ngets you thinking about your analysis and its building blocks\ntargets forces you into a functions approach to workflow\nentire pipeline is reproducible\nvisualise on one page\nsaves time\n(maybe we need an advanced function writing session in another C&C?)"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#and-can-be-easily-generated-in-r",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#and-can-be-easily-generated-in-r",
+ "title": "Conference Check-in App",
+ "section": "and can be easily generated in R",
+ "text": "and can be easily generated in R\ninstall.packages(\"qrcode\")\nlibrary(qrcode)\n\nqr_code(\"https://www.youtube.com/watch?v=dQw4w9WgXcQ\")"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#demonstration-in-a-live-project",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#demonstration-in-a-live-project",
- "title": "Coffee and Coding",
- "section": "Demonstration in a live project",
- "text": "Demonstration in a live project\nLet’s look at a real life example in a live project…"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#why-not",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#why-not",
+ "title": "Conference Check-in App",
+ "section": "Why not?",
+ "text": "Why not?\n\n{shiny} would be doing all the processing on the server side\nwe would need to read from a camera client side\nthen stream video to the server for {shiny} to detect and decode the QR codes"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#visualising",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#visualising",
- "title": "Coffee and Coding",
- "section": "Visualising",
- "text": "Visualising\nCurrent project in {targets} and visualised with tar_visnetwork()"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work",
+ "title": "Conference Check-in App",
+ "section": "How does this work?",
+ "text": "How does this work?\n\n\nFront-end\n\n\nuses the React JavaScript framework\n@yidel/react-qr-scanner\nApp scan’s a QR code, then sends this to our backend\nA window pops up to say who has checked in, or shows an error message"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#code",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#code",
- "title": "Coffee and Coding",
- "section": "Code",
- "text": "Code\n\nit’s like a recipe of steps\nit’s easier to read\nyou have built functions which you can transfer and reuse\nit’s efficient, good practice\ndebugging is easier because if/when it fails you know exactly which target it has failed on\nit creates intermediate cached objects you can fetch at any time"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-1",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-1",
+ "title": "Conference Check-in App",
+ "section": "How does this work?",
+ "text": "How does this work?\nBack-end\nUses the {plumber} R package to build the API, with endpoints for\n\ngetting the list of all of the attendees for that day\nuploading a list of attendees in bulk\nadding an attendee individually\ngetting an attendee\nchecking the attendee in"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#how-can-i-start-using-it",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#how-can-i-start-using-it",
- "title": "Coffee and Coding",
- "section": "How can I start using it?",
- "text": "How can I start using it?\n\nYou could “retro-fit” it to your project, but … ideally you should start your project off using {targets}\nThere are at least three of us in SU who have used it in our projects.\nWe are offering to hand hold you to get started with your next project.\nMatt, Tom, Jacqueline"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-2",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#how-does-this-work-2",
+ "title": "Conference Check-in App",
+ "section": "How does this work?",
+ "text": "How does this work?\nMore Back-end Stuff\n\nuses a simple SQLite DB that will be thrown away at the end of the conference\nwe send personalised emails using {blastula} to the attendees with their QR codes\nthe QR codes are just random ids (UUIDs) that identify each attendee\nuses websockets to update all of the clients when a user checks in (to update the list of attendees)"
},
{
- "objectID": "presentations/2024-01-25_coffee-and-coding/index.html#useful-targets-links",
- "href": "presentations/2024-01-25_coffee-and-coding/index.html#useful-targets-links",
- "title": "Coffee and Coding",
- "section": "Useful {targets} links",
- "text": "Useful {targets} links\n\nTom’s previous coffee and coding presentation\nMatt’s previous presentations\nThe {targets} documentation is detailed and easy to follow.\nA demo repository demonstrated in last weeks NHSE C&C\nSoftware Carpentry are developing a course here Pre-alpha targets course\nLive project demonstrated in this presentation using {targets}\n\n\n\n\nview slides at the-strategy-unit.github.io/data_science/presentations"
+ "objectID": "presentations/2023-10-17_conference-check-in-app/index.html#learning-different-tools-can-show-you-the-light",
+ "href": "presentations/2023-10-17_conference-check-in-app/index.html#learning-different-tools-can-show-you-the-light",
+ "title": "Conference Check-in App",
+ "section": "Learning different tools can show you the light",
+ "text": "Learning different tools can show you the light\n\nunsplash.com/photos/tMGMINwFOtI"
},
{
"objectID": "presentations/2023-05-15_text-mining/index.html#patient-experience",
diff --git a/site_libs/revealjs/dist/theme/quarto.css b/site_libs/revealjs/dist/theme/quarto.css
index 7f9c582..a5047c5 100644
--- a/site_libs/revealjs/dist/theme/quarto.css
+++ b/site_libs/revealjs/dist/theme/quarto.css
@@ -5,4 +5,4 @@
* we also add `bright-[color]-` synonyms for the `-[color]-intense` classes since
* that seems to be what ansi_up emits
*
-*/.ansi-black-fg{color:#3e424d}.ansi-black-bg{background-color:#3e424d}.ansi-black-intense-black,.ansi-bright-black-fg{color:#282c36}.ansi-black-intense-black,.ansi-bright-black-bg{background-color:#282c36}.ansi-red-fg{color:#e75c58}.ansi-red-bg{background-color:#e75c58}.ansi-red-intense-red,.ansi-bright-red-fg{color:#b22b31}.ansi-red-intense-red,.ansi-bright-red-bg{background-color:#b22b31}.ansi-green-fg{color:#00a250}.ansi-green-bg{background-color:#00a250}.ansi-green-intense-green,.ansi-bright-green-fg{color:#007427}.ansi-green-intense-green,.ansi-bright-green-bg{background-color:#007427}.ansi-yellow-fg{color:#ddb62b}.ansi-yellow-bg{background-color:#ddb62b}.ansi-yellow-intense-yellow,.ansi-bright-yellow-fg{color:#b27d12}.ansi-yellow-intense-yellow,.ansi-bright-yellow-bg{background-color:#b27d12}.ansi-blue-fg{color:#208ffb}.ansi-blue-bg{background-color:#208ffb}.ansi-blue-intense-blue,.ansi-bright-blue-fg{color:#0065ca}.ansi-blue-intense-blue,.ansi-bright-blue-bg{background-color:#0065ca}.ansi-magenta-fg{color:#d160c4}.ansi-magenta-bg{background-color:#d160c4}.ansi-magenta-intense-magenta,.ansi-bright-magenta-fg{color:#a03196}.ansi-magenta-intense-magenta,.ansi-bright-magenta-bg{background-color:#a03196}.ansi-cyan-fg{color:#60c6c8}.ansi-cyan-bg{background-color:#60c6c8}.ansi-cyan-intense-cyan,.ansi-bright-cyan-fg{color:#258f8f}.ansi-cyan-intense-cyan,.ansi-bright-cyan-bg{background-color:#258f8f}.ansi-white-fg{color:#c5c1b4}.ansi-white-bg{background-color:#c5c1b4}.ansi-white-intense-white,.ansi-bright-white-fg{color:#a1a6b2}.ansi-white-intense-white,.ansi-bright-white-bg{background-color:#a1a6b2}.ansi-default-inverse-fg{color:#fff}.ansi-default-inverse-bg{background-color:#000}.ansi-bold{font-weight:bold}.ansi-underline{text-decoration:underline}:root{--quarto-body-bg: #f5f4f3;--quarto-body-color: #635a54;--quarto-text-muted: #b0a7a0;--quarto-border-color: #f5f4f4;--quarto-border-width: 1px;--quarto-border-radius: 4px}table.gt_table{color:var(--quarto-body-color);font-size:1em;width:100%;background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_column_spanner_outer{color:var(--quarto-body-color);background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_col_heading{color:var(--quarto-body-color);font-weight:bold;background-color:rgba(0,0,0,0)}table.gt_table thead.gt_col_headings{border-bottom:1px solid currentColor;border-top-width:inherit;border-top-color:var(--quarto-border-color)}table.gt_table thead.gt_col_headings:not(:first-child){border-top-width:1px;border-top-color:var(--quarto-border-color)}table.gt_table td.gt_row{border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-width:0px}table.gt_table tbody.gt_table_body{border-top-width:1px;border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-color:currentColor}div.columns{display:initial;gap:initial}div.column{display:inline-block;overflow-x:initial;vertical-align:top;width:50%}.code-annotation-tip-content{word-wrap:break-word}.code-annotation-container-hidden{display:none !important}dl.code-annotation-container-grid{display:grid;grid-template-columns:min-content auto}dl.code-annotation-container-grid dt{grid-column:1}dl.code-annotation-container-grid dd{grid-column:2}pre.sourceCode.code-annotation-code{padding-right:0}code.sourceCode .code-annotation-anchor{z-index:100;position:relative;float:right;background-color:rgba(0,0,0,0)}input[type=checkbox]{margin-right:.5ch}:root{--mermaid-bg-color: #f5f4f3;--mermaid-edge-color: #999;--mermaid-node-fg-color: #635a54;--mermaid-fg-color: #635a54;--mermaid-fg-color--lighter: #7f746b;--mermaid-fg-color--lightest: #988d85;--mermaid-font-family: Source Sans Pro, Helvetica, sans-serif;--mermaid-label-bg-color: #f5f4f3;--mermaid-label-fg-color: #468;--mermaid-node-bg-color: rgba(68, 102, 136, 0.1);--mermaid-node-fg-color: #635a54}@media print{:root{font-size:11pt}#quarto-sidebar,#TOC,.nav-page{display:none}.page-columns .content{grid-column-start:page-start}.fixed-top{position:relative}.panel-caption,.figure-caption,figcaption{color:#666}}.code-copy-button{position:absolute;top:0;right:0;border:0;margin-top:5px;margin-right:5px;background-color:rgba(0,0,0,0);z-index:3}.code-copy-button:focus{outline:none}.code-copy-button-tooltip{font-size:.75em}pre.sourceCode:hover>.code-copy-button>.bi::before{display:inline-block;height:1rem;width:1rem;content:"";vertical-align:-0.125em;background-image:url('data:image/svg+xml,');background-repeat:no-repeat;background-size:1rem 1rem}pre.sourceCode:hover>.code-copy-button-checked>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button:hover>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button-checked:hover>.bi::before{background-image:url('data:image/svg+xml,')}.panel-tabset [role=tablist]{border-bottom:1px solid #f5f4f4;list-style:none;margin:0;padding:0;width:100%}.panel-tabset [role=tablist] *{-webkit-box-sizing:border-box;box-sizing:border-box}@media(min-width: 30em){.panel-tabset [role=tablist] li{display:inline-block}}.panel-tabset [role=tab]{border:1px solid rgba(0,0,0,0);border-top-color:#f5f4f4;display:block;padding:.5em 1em;text-decoration:none}@media(min-width: 30em){.panel-tabset [role=tab]{border-top-color:rgba(0,0,0,0);display:inline-block;margin-bottom:-1px}}.panel-tabset [role=tab][aria-selected=true]{background-color:#f5f4f4}@media(min-width: 30em){.panel-tabset [role=tab][aria-selected=true]{background-color:rgba(0,0,0,0);border:1px solid #f5f4f4;border-bottom-color:#f5f4f3}}@media(min-width: 30em){.panel-tabset [role=tab]:hover:not([aria-selected=true]){border:1px solid #f5f4f4}}.code-with-filename .code-with-filename-file{margin-bottom:0;padding-bottom:2px;padding-top:2px;padding-left:.7em;border:var(--quarto-border-width) solid var(--quarto-border-color);border-radius:var(--quarto-border-radius);border-bottom:0;border-bottom-left-radius:0%;border-bottom-right-radius:0%}.code-with-filename div.sourceCode,.reveal .code-with-filename div.sourceCode{margin-top:0;border-top-left-radius:0%;border-top-right-radius:0%}.code-with-filename .code-with-filename-file pre{margin-bottom:0}.code-with-filename .code-with-filename-file{background-color:rgba(219,219,219,.8)}.quarto-dark .code-with-filename .code-with-filename-file{background-color:#555}.code-with-filename .code-with-filename-file strong{font-weight:400}.reveal.center .slide aside,.reveal.center .slide div.aside{position:initial}section.has-light-background,section.has-light-background h1,section.has-light-background h2,section.has-light-background h3,section.has-light-background h4,section.has-light-background h5,section.has-light-background h6{color:#222}section.has-light-background a,section.has-light-background a:hover{color:#2a76dd}section.has-light-background code{color:#4758ab}section.has-dark-background,section.has-dark-background h1,section.has-dark-background h2,section.has-dark-background h3,section.has-dark-background h4,section.has-dark-background h5,section.has-dark-background h6{color:#fff}section.has-dark-background a,section.has-dark-background a:hover{color:#42affa}section.has-dark-background code{color:#ffa07a}#title-slide,div.reveal div.slides section.quarto-title-block{text-align:center}#title-slide .subtitle,div.reveal div.slides section.quarto-title-block .subtitle{margin-bottom:2.5rem}.reveal .slides{text-align:left}.reveal .title-slide h1{font-size:1.6em}.reveal[data-navigation-mode=linear] .title-slide h1{font-size:2.5em}.reveal div.sourceCode{border:1px solid #f5f4f4;border-radius:4px}.reveal pre{width:100%;box-shadow:none;background-color:#f5f4f3;border:none;margin:0;font-size:.55em}.reveal .code-with-filename .code-with-filename-file pre{background-color:unset}.reveal code{color:var(--quarto-hl-fu-color);background-color:rgba(0,0,0,0);white-space:pre-wrap}.reveal pre.sourceCode code{background-color:#f5f4f3;padding:6px 9px;max-height:500px;white-space:pre}.reveal pre code{background-color:#f5f4f3;color:#635a54}.reveal .column-output-location{display:flex;align-items:stretch}.reveal .column-output-location .column:first-of-type div.sourceCode{height:100%;background-color:#f5f4f3}.reveal blockquote{display:block;position:relative;color:#b0a7a0;width:unset;margin:var(--r-block-margin) auto;padding:.625rem 1.75rem;border-left:.25rem solid #b0a7a0;font-style:normal;background:none;box-shadow:none}.reveal blockquote p:first-child,.reveal blockquote p:last-child{display:block}.reveal .slide aside,.reveal .slide div.aside{position:absolute;bottom:20px;font-size:0.7em;color:#b0a7a0}.reveal .slide sup{font-size:0.7em}.reveal .slide.scrollable aside,.reveal .slide.scrollable div.aside{position:relative;margin-top:1em}.reveal .slide aside .aside-footnotes{margin-bottom:0}.reveal .slide aside .aside-footnotes li:first-of-type{margin-top:0}.reveal .layout-sidebar{display:flex;width:100%;margin-top:.8em}.reveal .layout-sidebar .panel-sidebar{width:270px}.reveal .layout-sidebar-left .panel-sidebar{margin-right:calc(0.5em*2)}.reveal .layout-sidebar-right .panel-sidebar{margin-left:calc(0.5em*2)}.reveal .layout-sidebar .panel-fill,.reveal .layout-sidebar .panel-center,.reveal .layout-sidebar .panel-tabset{flex:1}.reveal .panel-input,.reveal .panel-sidebar{font-size:.5em;padding:.5em;border-style:solid;border-color:#f5f4f4;border-width:1px;border-radius:4px;background-color:#f8f9fa}.reveal .panel-sidebar :first-child,.reveal .panel-fill :first-child{margin-top:0}.reveal .panel-sidebar :last-child,.reveal .panel-fill :last-child{margin-bottom:0}.panel-input>div,.panel-input>div>div{vertical-align:middle;padding-right:1em}.reveal p,.reveal .slides section,.reveal .slides section>section{line-height:1.3}.reveal.smaller .slides section,.reveal .slides section.smaller,.reveal .slides section .callout{font-size:0.7em}.reveal.smaller .slides section section{font-size:inherit}.reveal.smaller .slides h1,.reveal .slides section.smaller h1{font-size:calc(2.5em/0.7)}.reveal.smaller .slides h2,.reveal .slides section.smaller h2{font-size:calc(1.6em/0.7)}.reveal.smaller .slides h3,.reveal .slides section.smaller h3{font-size:calc(1.3em/0.7)}.reveal .columns>.column>:not(ul,ol){margin-left:.25em;margin-right:.25em}.reveal .columns>.column:first-child>:not(ul,ol){margin-right:.5em;margin-left:0}.reveal .columns>.column:last-child>:not(ul,ol){margin-right:0;margin-left:.5em}.reveal .slide-number{color:#7d9dcf;background-color:#f5f4f3}.reveal .footer{color:#b0a7a0}.reveal .footer a{color:#5881c1}.reveal .footer.has-dark-background{color:#fff}.reveal .footer.has-dark-background a{color:#7bc6fa}.reveal .footer.has-light-background{color:#505050}.reveal .footer.has-light-background a{color:#6a9bdd}.reveal .slide-number{color:#b0a7a0}.reveal .slide-number.has-dark-background{color:#fff}.reveal .slide-number.has-light-background{color:#505050}.reveal .slide figure>figcaption,.reveal .slide img.stretch+p.caption,.reveal .slide img.r-stretch+p.caption{font-size:0.7em}@media screen and (min-width: 500px){.reveal .controls[data-controls-layout=edges] .navigate-left{left:.2em}.reveal .controls[data-controls-layout=edges] .navigate-right{right:.2em}.reveal .controls[data-controls-layout=edges] .navigate-up{top:.4em}.reveal .controls[data-controls-layout=edges] .navigate-down{bottom:2.3em}}.tippy-box[data-theme~=light-border]{background-color:#f5f4f3;color:#635a54;border-radius:4px;border:solid 1px #b0a7a0;font-size:.6em}.tippy-box[data-theme~=light-border] .tippy-arrow{color:#b0a7a0}.tippy-box[data-placement^=bottom]>.tippy-content{padding:7px 10px;z-index:1}.reveal .callout.callout-style-simple .callout-body,.reveal .callout.callout-style-default .callout-body,.reveal .callout.callout-style-simple div.callout-title,.reveal .callout.callout-style-default div.callout-title{font-size:inherit}.reveal .callout.callout-style-default .callout-icon::before,.reveal .callout.callout-style-simple .callout-icon::before{height:2rem;width:2rem;background-size:2rem 2rem}.reveal .callout.callout-titled .callout-title p{margin-top:.5em}.reveal .callout.callout-titled .callout-icon::before{margin-top:1rem}.reveal .callout.callout-titled .callout-body>.callout-content>:last-child{margin-bottom:1rem}.reveal .panel-tabset [role=tab]{padding:.25em .7em}.reveal .slide-menu-button .fa-bars::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-easel2::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-brush::before{background-image:url('data:image/svg+xml,')}/*! light */.reveal ol[type=a]{list-style-type:lower-alpha}.reveal ol[type=a s]{list-style-type:lower-alpha}.reveal ol[type=A s]{list-style-type:upper-alpha}.reveal ol[type=i]{list-style-type:lower-roman}.reveal ol[type=i s]{list-style-type:lower-roman}.reveal ol[type=I s]{list-style-type:upper-roman}.reveal ol[type="1"]{list-style-type:decimal}.reveal ul.task-list{list-style:none}.reveal ul.task-list li input[type=checkbox]{width:2em;height:2em;margin:0 1em .5em -1.6em;vertical-align:middle}div.cell-output-display div.pagedtable-wrapper table.table{font-size:.6em}.reveal .code-annotation-container-hidden{display:none}.reveal code.sourceCode button.code-annotation-anchor,.reveal code.sourceCode .code-annotation-anchor{font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;color:var(--quarto-hl-co-color);border:solid var(--quarto-hl-co-color) 1px;border-radius:50%;font-size:.7em;line-height:1.2em;margin-top:2px;user-select:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;-o-user-select:none}.reveal code.sourceCode button.code-annotation-anchor{cursor:pointer}.reveal code.sourceCode a.code-annotation-anchor{text-align:center;vertical-align:middle;text-decoration:none;cursor:default;height:1.2em;width:1.2em}.reveal code.sourceCode.fragment a.code-annotation-anchor{left:auto}.reveal #code-annotation-line-highlight-gutter{width:100%;border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2}.reveal #code-annotation-line-highlight{margin-left:-8em;width:calc(100% + 4em);border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2;margin-bottom:-2px}.reveal code.sourceCode .code-annotation-anchor.code-annotation-active{background-color:var(--quarto-hl-normal-color, #aaaaaa);border:solid var(--quarto-hl-normal-color, #aaaaaa) 1px;color:#f5f4f3;font-weight:bolder}.reveal pre.code-annotation-code{padding-top:0;padding-bottom:0}.reveal pre.code-annotation-code code{z-index:3;padding-left:0px}.reveal dl.code-annotation-container-grid{margin-left:.1em}.reveal dl.code-annotation-container-grid dt{margin-top:.65rem;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;border:solid #635a54 1px;border-radius:50%;height:1.3em;width:1.3em;line-height:1.3em;font-size:.5em;text-align:center;vertical-align:middle;text-decoration:none}.reveal dl.code-annotation-container-grid dd{margin-left:.25em}.reveal .scrollable ol li:first-child:nth-last-child(n+10),.reveal .scrollable ol li:first-child:nth-last-child(n+10)~li{margin-left:1em}html.print-pdf .reveal .slides .pdf-page:last-child{page-break-after:avoid}.reveal .quarto-title-block .quarto-title-authors{display:flex;justify-content:center}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author{padding-left:.5em;padding-right:.5em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:hover,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:visited,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:active{color:inherit;text-decoration:none}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-name{margin-bottom:.1rem}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-email{margin-top:0px;margin-bottom:.4em;font-size:.6em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-orcid img{margin-bottom:4px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation{font-size:.7em;margin-top:0px;margin-bottom:8px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation:first{margin-top:12px}.reveal .slide blockquote{border-left:3px solid #b0a7a0;padding-left:.6em;background:#f6f7f7}.reveal h2{padding-bottom:.3em}.reveal .footer{color:#f9bd07;background-color:#2c2825;display:block;position:fixed;bottom:0px !important;padding-bottom:12px;padding-top:12px;width:100%;text-align:center;font-size:18px;z-index:2}.reveal .slide ul li{list-style:none}.reveal .slide ul li::before{content:"◼";color:#f9bd07;display:inline-block;width:1.5em;margin-left:-1.5em;font-size:66%}.reveal .progress span{background-color:#f9bd07}.reveal .slide-logo{z-index:3;bottom:-5px !important}.slide-background:first-child{background-color:#2c2825;background-image:radial-gradient(#635a54 1%, transparent 5%);background-position:0 0,10px 10px;background-size:30px 30px;background-repeat:repeat;height:100%;width:100%}.slide-background:first-child .slide-background-content{background-image:url("https://the-strategy-unit.github.io/assets/logo_yellow.svg");top:.5em;right:.5em;height:4em;width:4em;position:absolute}#title-slide{text-align:left}#title-slide h1{font-size:2em;color:#f9bd07 !important}#title-slide .subtitle{color:#c7c1bc}#title-slide .quarto-title-authors{justify-content:left;display:block}#title-slide .quarto-title-author{padding-top:1em;color:#ec6555;padding:0}#title-slide .quarto-title-author a{color:#5881c1}#title-slide p.institute{font-size:.75em;color:#c7c1bc}#title-slide p.date{font-size:.5em;color:#988d85}.reveal .slide-logo:first-child{display:none}.inverse .slide-background-content{background-color:#2c2825}.inverse h1,.inverse h2,.inverse h3,.inverse h4{color:#988d85 !important}.reveal .imitate-title{font-size:2em}.reveal .inverse .imitate-title{color:#f9bd07 !important}.text-bottom{bottom:1em;position:absolute}.no-bullets ul{list-style-type:none;padding:0;margin:0}.small-table table{font-size:1.65rem}.small{font-size:1.2rem}.yellow{color:#f9bd07}.light-yellow{color:#fef2ce}.light-charcoal{color:#988d85}.very-light-charcoal{color:#c7c1bc}.center{text-align:center}/*# sourceMappingURL=f95d2bded9c28492b788fe14c3e9f347.css.map */
+*/.ansi-black-fg{color:#3e424d}.ansi-black-bg{background-color:#3e424d}.ansi-black-intense-black,.ansi-bright-black-fg{color:#282c36}.ansi-black-intense-black,.ansi-bright-black-bg{background-color:#282c36}.ansi-red-fg{color:#e75c58}.ansi-red-bg{background-color:#e75c58}.ansi-red-intense-red,.ansi-bright-red-fg{color:#b22b31}.ansi-red-intense-red,.ansi-bright-red-bg{background-color:#b22b31}.ansi-green-fg{color:#00a250}.ansi-green-bg{background-color:#00a250}.ansi-green-intense-green,.ansi-bright-green-fg{color:#007427}.ansi-green-intense-green,.ansi-bright-green-bg{background-color:#007427}.ansi-yellow-fg{color:#ddb62b}.ansi-yellow-bg{background-color:#ddb62b}.ansi-yellow-intense-yellow,.ansi-bright-yellow-fg{color:#b27d12}.ansi-yellow-intense-yellow,.ansi-bright-yellow-bg{background-color:#b27d12}.ansi-blue-fg{color:#208ffb}.ansi-blue-bg{background-color:#208ffb}.ansi-blue-intense-blue,.ansi-bright-blue-fg{color:#0065ca}.ansi-blue-intense-blue,.ansi-bright-blue-bg{background-color:#0065ca}.ansi-magenta-fg{color:#d160c4}.ansi-magenta-bg{background-color:#d160c4}.ansi-magenta-intense-magenta,.ansi-bright-magenta-fg{color:#a03196}.ansi-magenta-intense-magenta,.ansi-bright-magenta-bg{background-color:#a03196}.ansi-cyan-fg{color:#60c6c8}.ansi-cyan-bg{background-color:#60c6c8}.ansi-cyan-intense-cyan,.ansi-bright-cyan-fg{color:#258f8f}.ansi-cyan-intense-cyan,.ansi-bright-cyan-bg{background-color:#258f8f}.ansi-white-fg{color:#c5c1b4}.ansi-white-bg{background-color:#c5c1b4}.ansi-white-intense-white,.ansi-bright-white-fg{color:#a1a6b2}.ansi-white-intense-white,.ansi-bright-white-bg{background-color:#a1a6b2}.ansi-default-inverse-fg{color:#fff}.ansi-default-inverse-bg{background-color:#000}.ansi-bold{font-weight:bold}.ansi-underline{text-decoration:underline}:root{--quarto-body-bg: #f5f4f3;--quarto-body-color: #635a54;--quarto-text-muted: #b0a7a0;--quarto-border-color: #f5f4f4;--quarto-border-width: 1px;--quarto-border-radius: 4px}table.gt_table{color:var(--quarto-body-color);font-size:1em;width:100%;background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_column_spanner_outer{color:var(--quarto-body-color);background-color:rgba(0,0,0,0);border-top-width:inherit;border-bottom-width:inherit;border-color:var(--quarto-border-color)}table.gt_table th.gt_col_heading{color:var(--quarto-body-color);font-weight:bold;background-color:rgba(0,0,0,0)}table.gt_table thead.gt_col_headings{border-bottom:1px solid currentColor;border-top-width:inherit;border-top-color:var(--quarto-border-color)}table.gt_table thead.gt_col_headings:not(:first-child){border-top-width:1px;border-top-color:var(--quarto-border-color)}table.gt_table td.gt_row{border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-width:0px}table.gt_table tbody.gt_table_body{border-top-width:1px;border-bottom-width:1px;border-bottom-color:var(--quarto-border-color);border-top-color:currentColor}div.columns{display:initial;gap:initial}div.column{display:inline-block;overflow-x:initial;vertical-align:top;width:50%}.code-annotation-tip-content{word-wrap:break-word}.code-annotation-container-hidden{display:none !important}dl.code-annotation-container-grid{display:grid;grid-template-columns:min-content auto}dl.code-annotation-container-grid dt{grid-column:1}dl.code-annotation-container-grid dd{grid-column:2}pre.sourceCode.code-annotation-code{padding-right:0}code.sourceCode .code-annotation-anchor{z-index:100;position:relative;float:right;background-color:rgba(0,0,0,0)}input[type=checkbox]{margin-right:.5ch}:root{--mermaid-bg-color: #f5f4f3;--mermaid-edge-color: #999;--mermaid-node-fg-color: #635a54;--mermaid-fg-color: #635a54;--mermaid-fg-color--lighter: #7f746b;--mermaid-fg-color--lightest: #988d85;--mermaid-font-family: Source Sans Pro, Helvetica, sans-serif;--mermaid-label-bg-color: #f5f4f3;--mermaid-label-fg-color: #468;--mermaid-node-bg-color: rgba(68, 102, 136, 0.1);--mermaid-node-fg-color: #635a54}@media print{:root{font-size:11pt}#quarto-sidebar,#TOC,.nav-page{display:none}.page-columns .content{grid-column-start:page-start}.fixed-top{position:relative}.panel-caption,.figure-caption,figcaption{color:#666}}.code-copy-button{position:absolute;top:0;right:0;border:0;margin-top:5px;margin-right:5px;background-color:rgba(0,0,0,0);z-index:3}.code-copy-button:focus{outline:none}.code-copy-button-tooltip{font-size:.75em}pre.sourceCode:hover>.code-copy-button>.bi::before{display:inline-block;height:1rem;width:1rem;content:"";vertical-align:-0.125em;background-image:url('data:image/svg+xml,');background-repeat:no-repeat;background-size:1rem 1rem}pre.sourceCode:hover>.code-copy-button-checked>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button:hover>.bi::before{background-image:url('data:image/svg+xml,')}pre.sourceCode:hover>.code-copy-button-checked:hover>.bi::before{background-image:url('data:image/svg+xml,')}.panel-tabset [role=tablist]{border-bottom:1px solid #f5f4f4;list-style:none;margin:0;padding:0;width:100%}.panel-tabset [role=tablist] *{-webkit-box-sizing:border-box;box-sizing:border-box}@media(min-width: 30em){.panel-tabset [role=tablist] li{display:inline-block}}.panel-tabset [role=tab]{border:1px solid rgba(0,0,0,0);border-top-color:#f5f4f4;display:block;padding:.5em 1em;text-decoration:none}@media(min-width: 30em){.panel-tabset [role=tab]{border-top-color:rgba(0,0,0,0);display:inline-block;margin-bottom:-1px}}.panel-tabset [role=tab][aria-selected=true]{background-color:#f5f4f4}@media(min-width: 30em){.panel-tabset [role=tab][aria-selected=true]{background-color:rgba(0,0,0,0);border:1px solid #f5f4f4;border-bottom-color:#f5f4f3}}@media(min-width: 30em){.panel-tabset [role=tab]:hover:not([aria-selected=true]){border:1px solid #f5f4f4}}.code-with-filename .code-with-filename-file{margin-bottom:0;padding-bottom:2px;padding-top:2px;padding-left:.7em;border:var(--quarto-border-width) solid var(--quarto-border-color);border-radius:var(--quarto-border-radius);border-bottom:0;border-bottom-left-radius:0%;border-bottom-right-radius:0%}.code-with-filename div.sourceCode,.reveal .code-with-filename div.sourceCode{margin-top:0;border-top-left-radius:0%;border-top-right-radius:0%}.code-with-filename .code-with-filename-file pre{margin-bottom:0}.code-with-filename .code-with-filename-file{background-color:rgba(219,219,219,.8)}.quarto-dark .code-with-filename .code-with-filename-file{background-color:#555}.code-with-filename .code-with-filename-file strong{font-weight:400}.reveal.center .slide aside,.reveal.center .slide div.aside{position:initial}section.has-light-background,section.has-light-background h1,section.has-light-background h2,section.has-light-background h3,section.has-light-background h4,section.has-light-background h5,section.has-light-background h6{color:#222}section.has-light-background a,section.has-light-background a:hover{color:#2a76dd}section.has-light-background code{color:#4758ab}section.has-dark-background,section.has-dark-background h1,section.has-dark-background h2,section.has-dark-background h3,section.has-dark-background h4,section.has-dark-background h5,section.has-dark-background h6{color:#fff}section.has-dark-background a,section.has-dark-background a:hover{color:#42affa}section.has-dark-background code{color:#ffa07a}#title-slide,div.reveal div.slides section.quarto-title-block{text-align:center}#title-slide .subtitle,div.reveal div.slides section.quarto-title-block .subtitle{margin-bottom:2.5rem}.reveal .slides{text-align:left}.reveal .title-slide h1{font-size:1.6em}.reveal[data-navigation-mode=linear] .title-slide h1{font-size:2.5em}.reveal div.sourceCode{border:1px solid #f5f4f4;border-radius:4px}.reveal pre{width:100%;box-shadow:none;background-color:#f5f4f3;border:none;margin:0;font-size:.55em}.reveal .code-with-filename .code-with-filename-file pre{background-color:unset}.reveal code{color:var(--quarto-hl-fu-color);background-color:rgba(0,0,0,0);white-space:pre-wrap}.reveal pre.sourceCode code{background-color:#f5f4f3;padding:6px 9px;max-height:500px;white-space:pre}.reveal pre code{background-color:#f5f4f3;color:#635a54}.reveal .column-output-location{display:flex;align-items:stretch}.reveal .column-output-location .column:first-of-type div.sourceCode{height:100%;background-color:#f5f4f3}.reveal blockquote{display:block;position:relative;color:#b0a7a0;width:unset;margin:var(--r-block-margin) auto;padding:.625rem 1.75rem;border-left:.25rem solid #b0a7a0;font-style:normal;background:none;box-shadow:none}.reveal blockquote p:first-child,.reveal blockquote p:last-child{display:block}.reveal .slide aside,.reveal .slide div.aside{position:absolute;bottom:20px;font-size:0.7em;color:#b0a7a0}.reveal .slide sup{font-size:0.7em}.reveal .slide.scrollable aside,.reveal .slide.scrollable div.aside{position:relative;margin-top:1em}.reveal .slide aside .aside-footnotes{margin-bottom:0}.reveal .slide aside .aside-footnotes li:first-of-type{margin-top:0}.reveal .layout-sidebar{display:flex;width:100%;margin-top:.8em}.reveal .layout-sidebar .panel-sidebar{width:270px}.reveal .layout-sidebar-left .panel-sidebar{margin-right:calc(0.5em*2)}.reveal .layout-sidebar-right .panel-sidebar{margin-left:calc(0.5em*2)}.reveal .layout-sidebar .panel-fill,.reveal .layout-sidebar .panel-center,.reveal .layout-sidebar .panel-tabset{flex:1}.reveal .panel-input,.reveal .panel-sidebar{font-size:.5em;padding:.5em;border-style:solid;border-color:#f5f4f4;border-width:1px;border-radius:4px;background-color:#f8f9fa}.reveal .panel-sidebar :first-child,.reveal .panel-fill :first-child{margin-top:0}.reveal .panel-sidebar :last-child,.reveal .panel-fill :last-child{margin-bottom:0}.panel-input>div,.panel-input>div>div{vertical-align:middle;padding-right:1em}.reveal p,.reveal .slides section,.reveal .slides section>section{line-height:1.3}.reveal.smaller .slides section,.reveal .slides section.smaller,.reveal .slides section .callout{font-size:0.7em}.reveal.smaller .slides section section{font-size:inherit}.reveal.smaller .slides h1,.reveal .slides section.smaller h1{font-size:calc(2.5em/0.7)}.reveal.smaller .slides h2,.reveal .slides section.smaller h2{font-size:calc(1.6em/0.7)}.reveal.smaller .slides h3,.reveal .slides section.smaller h3{font-size:calc(1.3em/0.7)}.reveal .columns>.column>:not(ul,ol){margin-left:.25em;margin-right:.25em}.reveal .columns>.column:first-child>:not(ul,ol){margin-right:.5em;margin-left:0}.reveal .columns>.column:last-child>:not(ul,ol){margin-right:0;margin-left:.5em}.reveal .slide-number{color:#7d9dcf;background-color:#f5f4f3}.reveal .footer{color:#b0a7a0}.reveal .footer a{color:#5881c1}.reveal .footer.has-dark-background{color:#fff}.reveal .footer.has-dark-background a{color:#7bc6fa}.reveal .footer.has-light-background{color:#505050}.reveal .footer.has-light-background a{color:#6a9bdd}.reveal .slide-number{color:#b0a7a0}.reveal .slide-number.has-dark-background{color:#fff}.reveal .slide-number.has-light-background{color:#505050}.reveal .slide figure>figcaption,.reveal .slide img.stretch+p.caption,.reveal .slide img.r-stretch+p.caption{font-size:0.7em}@media screen and (min-width: 500px){.reveal .controls[data-controls-layout=edges] .navigate-left{left:.2em}.reveal .controls[data-controls-layout=edges] .navigate-right{right:.2em}.reveal .controls[data-controls-layout=edges] .navigate-up{top:.4em}.reveal .controls[data-controls-layout=edges] .navigate-down{bottom:2.3em}}.tippy-box[data-theme~=light-border]{background-color:#f5f4f3;color:#635a54;border-radius:4px;border:solid 1px #b0a7a0;font-size:.6em}.tippy-box[data-theme~=light-border] .tippy-arrow{color:#b0a7a0}.tippy-box[data-placement^=bottom]>.tippy-content{padding:7px 10px;z-index:1}.reveal .callout.callout-style-simple .callout-body,.reveal .callout.callout-style-default .callout-body,.reveal .callout.callout-style-simple div.callout-title,.reveal .callout.callout-style-default div.callout-title{font-size:inherit}.reveal .callout.callout-style-default .callout-icon::before,.reveal .callout.callout-style-simple .callout-icon::before{height:2rem;width:2rem;background-size:2rem 2rem}.reveal .callout.callout-titled .callout-title p{margin-top:.5em}.reveal .callout.callout-titled .callout-icon::before{margin-top:1rem}.reveal .callout.callout-titled .callout-body>.callout-content>:last-child{margin-bottom:1rem}.reveal .panel-tabset [role=tab]{padding:.25em .7em}.reveal .slide-menu-button .fa-bars::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-easel2::before{background-image:url('data:image/svg+xml,')}.reveal .slide-chalkboard-buttons .fa-brush::before{background-image:url('data:image/svg+xml,')}/*! light */.reveal ol[type=a]{list-style-type:lower-alpha}.reveal ol[type=a s]{list-style-type:lower-alpha}.reveal ol[type=A s]{list-style-type:upper-alpha}.reveal ol[type=i]{list-style-type:lower-roman}.reveal ol[type=i s]{list-style-type:lower-roman}.reveal ol[type=I s]{list-style-type:upper-roman}.reveal ol[type="1"]{list-style-type:decimal}.reveal ul.task-list{list-style:none}.reveal ul.task-list li input[type=checkbox]{width:2em;height:2em;margin:0 1em .5em -1.6em;vertical-align:middle}div.cell-output-display div.pagedtable-wrapper table.table{font-size:.6em}.reveal .code-annotation-container-hidden{display:none}.reveal code.sourceCode button.code-annotation-anchor,.reveal code.sourceCode .code-annotation-anchor{font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;color:var(--quarto-hl-co-color);border:solid var(--quarto-hl-co-color) 1px;border-radius:50%;font-size:.7em;line-height:1.2em;margin-top:2px;user-select:none;-webkit-user-select:none;-moz-user-select:none;-ms-user-select:none;-o-user-select:none}.reveal code.sourceCode button.code-annotation-anchor{cursor:pointer}.reveal code.sourceCode a.code-annotation-anchor{text-align:center;vertical-align:middle;text-decoration:none;cursor:default;height:1.2em;width:1.2em}.reveal code.sourceCode.fragment a.code-annotation-anchor{left:auto}.reveal #code-annotation-line-highlight-gutter{width:100%;border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2}.reveal #code-annotation-line-highlight{margin-left:-8em;width:calc(100% + 4em);border-top:solid var(--quarto-hl-co-color) 1px;border-bottom:solid var(--quarto-hl-co-color) 1px;z-index:2;margin-bottom:-2px}.reveal code.sourceCode .code-annotation-anchor.code-annotation-active{background-color:var(--quarto-hl-normal-color, #aaaaaa);border:solid var(--quarto-hl-normal-color, #aaaaaa) 1px;color:#f5f4f3;font-weight:bolder}.reveal pre.code-annotation-code{padding-top:0;padding-bottom:0}.reveal pre.code-annotation-code code{z-index:3;padding-left:0px}.reveal dl.code-annotation-container-grid{margin-left:.1em}.reveal dl.code-annotation-container-grid dt{margin-top:.65rem;font-family:SFMono-Regular,Menlo,Monaco,Consolas,"Liberation Mono","Courier New",monospace;border:solid #635a54 1px;border-radius:50%;height:1.3em;width:1.3em;line-height:1.3em;font-size:.5em;text-align:center;vertical-align:middle;text-decoration:none}.reveal dl.code-annotation-container-grid dd{margin-left:.25em}.reveal .scrollable ol li:first-child:nth-last-child(n+10),.reveal .scrollable ol li:first-child:nth-last-child(n+10)~li{margin-left:1em}html.print-pdf .reveal .slides .pdf-page:last-child{page-break-after:avoid}.reveal .quarto-title-block .quarto-title-authors{display:flex;justify-content:center}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author{padding-left:.5em;padding-right:.5em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:hover,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:visited,.reveal .quarto-title-block .quarto-title-authors .quarto-title-author a:active{color:inherit;text-decoration:none}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-name{margin-bottom:.1rem}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-email{margin-top:0px;margin-bottom:.4em;font-size:.6em}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-author-orcid img{margin-bottom:4px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation{font-size:.7em;margin-top:0px;margin-bottom:8px}.reveal .quarto-title-block .quarto-title-authors .quarto-title-author .quarto-title-affiliation:first{margin-top:12px}.reveal .slide blockquote{border-left:3px solid #b0a7a0;padding-left:.6em;background:#f6f7f7}.reveal h2{padding-bottom:.3em}.reveal .footer{color:#f9bd07;background-color:#2c2825;display:block;position:fixed;bottom:0px !important;padding-bottom:12px;padding-top:12px;width:100%;text-align:center;font-size:18px;z-index:2}.reveal .slide ul li{list-style:none}.reveal .slide ul li::before{content:"◼";color:#f9bd07;display:inline-block;width:1.5em;margin-left:-1.5em;font-size:66%}.reveal .progress span{background-color:#f9bd07}.reveal .slide-logo{z-index:3;bottom:-5px !important}.slide-background:first-child{background-color:#2c2825;background-image:radial-gradient(#635a54 1%, transparent 5%);background-position:0 0,10px 10px;background-size:30px 30px;background-repeat:repeat;height:100%;width:100%}.slide-background:first-child .slide-background-content{background-image:url("https://the-strategy-unit.github.io/assets/logo_yellow.svg");top:.5em;right:.5em;height:4em;width:4em;position:absolute}#title-slide{text-align:left}#title-slide h1{font-size:2em;color:#f9bd07 !important}#title-slide .subtitle{color:#c7c1bc}#title-slide .quarto-title-authors{justify-content:left;display:block}#title-slide .quarto-title-author{padding-top:1em;color:#ec6555;padding:0}#title-slide .quarto-title-author a{color:#5881c1}#title-slide p.institute{font-size:.75em;color:#c7c1bc}#title-slide p.date{font-size:.5em;color:#988d85}.reveal .slide-logo:first-child{display:none}.inverse .slide-background-content{background-color:#2c2825}.inverse h1,.inverse h2,.inverse h3,.inverse h4{color:#988d85 !important}.reveal .imitate-title{font-size:2em}.reveal .inverse .imitate-title{color:#f9bd07 !important}.reveal .inverse{color:#f5f4f3}.text-bottom{bottom:1em;position:absolute}.no-bullets ul{list-style-type:none;padding:0;margin:0}.small-table table{font-size:1.65rem}.small{font-size:1.2rem}.yellow{color:#f9bd07}.light-yellow{color:#fef2ce}.light-charcoal{color:#988d85}.very-light-charcoal{color:#c7c1bc}.center{text-align:center}/*# sourceMappingURL=f95d2bded9c28492b788fe14c3e9f347.css.map */
diff --git a/sitemap.xml b/sitemap.xml
index fe491ce..cb3e532 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,158 +2,162 @@
https://the-strategy-unit.github.io/data_science/index.html
- 2024-08-27T11:55:08.242Z
+ 2024-09-02T15:01:20.634Z
https://the-strategy-unit.github.io/data_science/style/project_structure.html
- 2024-08-27T11:55:08.294Z
+ 2024-09-02T15:01:20.690Z
https://the-strategy-unit.github.io/data_science/style/git_and_github.html
- 2024-08-27T11:55:08.294Z
+ 2024-09-02T15:01:20.690Z
https://the-strategy-unit.github.io/data_science/presentations/2023-03-23_coffee-and-coding/index.html
- 2024-08-27T11:55:08.250Z
+ 2024-09-02T15:01:20.642Z
https://the-strategy-unit.github.io/data_science/presentations/2023-02-23_coffee-and-coding/index.html
- 2024-08-27T11:55:08.242Z
+ 2024-09-02T15:01:20.634Z
https://the-strategy-unit.github.io/data_science/presentations/index.html
- 2024-08-27T11:55:08.290Z
+ 2024-09-02T15:01:20.690Z
https://the-strategy-unit.github.io/data_science/presentations/2023-03-23_collaborative-working/index.html
- 2024-08-27T11:55:08.254Z
+ 2024-09-02T15:01:20.642Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-10-17_conference-check-in-app/index.html
- 2024-08-27T11:55:08.274Z
+ https://the-strategy-unit.github.io/data_science/presentations/2024-09-05_earl-nhp/index.html
+ 2024-09-02T15:01:20.682Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-08-02_mlcsu-ksn-meeting/index.html
- 2024-08-27T11:55:08.258Z
+ https://the-strategy-unit.github.io/data_science/presentations/2024-01-25_coffee-and-coding/index.html
+ 2024-09-02T15:01:20.666Z
- https://the-strategy-unit.github.io/data_science/presentations/2024-05-30_open-source-licensing/index.html
- 2024-08-27T11:55:08.286Z
+ https://the-strategy-unit.github.io/data_science/presentations/2024-08-22_agile-and-scrum/index.html
+ 2024-09-02T15:01:20.682Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_midlands-analyst-rap/index.html
- 2024-08-27T11:55:08.250Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-08-24_coffee-and-coding_geospatial/index.html
+ 2024-09-02T15:01:20.650Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html
- 2024-08-27T11:55:08.266Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-08-23_nhs-r_unit-testing/index.html
+ 2024-09-02T15:01:20.650Z
- https://the-strategy-unit.github.io/data_science/presentations/2024-05-16_store-data-safely/index.html
- 2024-08-27T11:55:08.278Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_coffee-and-coding/index.html
+ 2024-09-02T15:01:20.634Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26_alternative_remotes.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-07-11_haca-nhp-demand-model/index.html
+ 2024-09-02T15:01:20.642Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2024-02-28_sankey_plot.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/azure_python.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-17_nearest_neighbour.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-21-rstudio-tips/index.html
- 2024-08-27T11:55:08.238Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/index.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26-reinstalling-r-packages.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-13_one-year-coffee-code.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2024-08-08-map-and-nest/index.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-24_hotfix-with-git.html
+ 2024-09-02T15:01:20.630Z
+
+
+ https://the-strategy-unit.github.io/data_science/blogs/index.html
+ 2024-09-02T15:01:20.630Z
https://the-strategy-unit.github.io/data_science/about.html
- 2024-08-27T11:55:08.238Z
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/index.html
- 2024-08-27T11:55:08.238Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2024-08-08-map-and-nest/index.html
+ 2024-09-02T15:01:20.634Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-24_hotfix-with-git.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26-reinstalling-r-packages.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-13_one-year-coffee-code.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2023-03-21-rstudio-tips/index.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/index.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2024-05-22-storing-data-safely/azure_python.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-17_nearest_neighbour.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2024-01-10-advent-of-code-and-test-driven-development.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/blogs/posts/2024-02-28_sankey_plot.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/blogs/posts/2023-04-26_alternative_remotes.html
+ 2024-09-02T15:01:20.630Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-07-11_haca-nhp-demand-model/index.html
- 2024-08-27T11:55:08.254Z
+ https://the-strategy-unit.github.io/data_science/presentations/2024-05-16_store-data-safely/index.html
+ 2024-09-02T15:01:20.670Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_coffee-and-coding/index.html
- 2024-08-27T11:55:08.242Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-10-09_nhs-r_conf_sd_in_health_social_care/index.html
+ 2024-09-02T15:01:20.658Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-08-23_nhs-r_unit-testing/index.html
- 2024-08-27T11:55:08.258Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-03-09_midlands-analyst-rap/index.html
+ 2024-09-02T15:01:20.642Z
- https://the-strategy-unit.github.io/data_science/presentations/2023-08-24_coffee-and-coding_geospatial/index.html
- 2024-08-27T11:55:08.258Z
+ https://the-strategy-unit.github.io/data_science/presentations/2024-05-30_open-source-licensing/index.html
+ 2024-09-02T15:01:20.678Z
- https://the-strategy-unit.github.io/data_science/presentations/2024-08-22_agile-and-scrum/index.html
- 2024-08-27T11:55:08.290Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-08-02_mlcsu-ksn-meeting/index.html
+ 2024-09-02T15:01:20.650Z
- https://the-strategy-unit.github.io/data_science/presentations/2024-01-25_coffee-and-coding/index.html
- 2024-08-27T11:55:08.274Z
+ https://the-strategy-unit.github.io/data_science/presentations/2023-10-17_conference-check-in-app/index.html
+ 2024-09-02T15:01:20.666Z
https://the-strategy-unit.github.io/data_science/presentations/2023-05-15_text-mining/index.html
- 2024-08-27T11:55:08.254Z
+ 2024-09-02T15:01:20.642Z
https://the-strategy-unit.github.io/data_science/presentations/2023-09-07_coffee_and_coding_functions/index.html
- 2024-08-27T11:55:08.258Z
+ 2024-09-02T15:01:20.650Z
https://the-strategy-unit.github.io/data_science/presentations/2023-05-23_data-science-for-good/index.html
- 2024-08-27T11:55:08.254Z
+ 2024-09-02T15:01:20.642Z
https://the-strategy-unit.github.io/data_science/presentations/2024-05-23_github-team-sport/index.html
- 2024-08-27T11:55:08.282Z
+ 2024-09-02T15:01:20.674Z
https://the-strategy-unit.github.io/data_science/presentations/2023-02-01_what-is-data-science/index.html
- 2024-08-27T11:55:08.242Z
+ 2024-09-02T15:01:20.634Z
https://the-strategy-unit.github.io/data_science/style/data_storage.html
- 2024-08-27T11:55:08.294Z
+ 2024-09-02T15:01:20.690Z
https://the-strategy-unit.github.io/data_science/style/style_guide.html
- 2024-08-27T11:55:08.294Z
+ 2024-09-02T15:01:20.690Z
diff --git a/style/data_storage.html b/style/data_storage.html
index 4a39d85..d94b30a 100644
--- a/style/data_storage.html
+++ b/style/data_storage.html
@@ -2,7 +2,7 @@
-
+
diff --git a/style/git_and_github.html b/style/git_and_github.html
index 83fd36a..d170f97 100644
--- a/style/git_and_github.html
+++ b/style/git_and_github.html
@@ -2,7 +2,7 @@
-
+
diff --git a/style/project_structure.html b/style/project_structure.html
index bd1fce7..44955c6 100644
--- a/style/project_structure.html
+++ b/style/project_structure.html
@@ -2,7 +2,7 @@
-
+
diff --git a/style/style_guide.html b/style/style_guide.html
index 8c02449..aa689ce 100644
--- a/style/style_guide.html
+++ b/style/style_guide.html
@@ -2,7 +2,7 @@
-
+