-
Notifications
You must be signed in to change notification settings - Fork 1
/
monte_carlo_cba.qmd
1094 lines (742 loc) · 36 KB
/
monte_carlo_cba.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: Monte Carlo Simulations for Cost Benefit Analysis
subtitle: A discussion
format:
clean-revealjs:
self-contained: true
author:
- name: Zac Payne-Thompson
orcid: Head of Data Tools and Insights
email: zachary.payne-thompson@dcms.gov.uk
affiliations: The Department for Culture, Media and Sport
date: last-modified
---
```{r, echo=FALSE, results='hide'}
source("monte_carlo_cba.R")
```
## Contents
Pre-requisites
- Brief History of CBA
- Outline of current workflow in green book
- What do we mean by sensitivities?
Analysis
- Overview of distributions
- Monte Carlo analysis
- Interlude on Optimism Bias
- Distributional approaches
## Contents
Evaluation
- What would this mean in practice?
Reflection
- Does this matter?
# A Brief History of CBA {background-color="#40666e"}
## {background-image="images/Screenshot 2023-11-02 at 08.43.01.png" background-size="50%"}
::: {.notes}
Cost-Benefit Analysis, or CBA, is a powerful decision-making tool that quantifies the costs and benefits of various projects or policies. To understand its historical context, we need to go back to the 20th century, a period marked by significant changes in the role of central governments.
In the United States and elsewhere, central governments grew substantially during this time, and this growth played a crucial role in the evolution of CBA. It was in 1936, during the New Deal era in the United States, that CBA was first employed. Congress ordered government agencies to use CBA to evaluate projects related to flood control.
The use of CBA became increasingly popular among administrative agencies as the federal government continued to expand. But to truly appreciate the development of CBA, we must also consider the influence of Progressivism and the birth of welfare economics. These ideological shifts separated value-laden politics from the realm of administrative expertise grounded in scientific principles.
:::
## Historical Development of CBA {background-image="images/Franklin-D-Roosevelt-Henry-Wallace-farm-relief-bill-1933.webp" background-size="50%"}
### The New Deal Era
::: {.notes}
The New Deal government's adoption of CBA in 1936 was a pivotal moment in its history, as it marked the first official use of CBA in a government context.
As the central government in the United States and other countries continued to grow, CBA gained rapid popularity among administrative agencies. This growth was not isolated; it was influenced by the rise of Progressivism during the late 19th and early 20th centuries. Progressives believed in the separation of politics, driven by values, from administrative decisions grounded in scientific principles.
This ideological shift paved the way for the development of modern welfare economics, which supplied the scientific principles necessary for the implementation of CBA. Early welfare economists believed that economic concepts could rationalise government policies, and their efforts gained further momentum in the 1950s and 1960s when governments sought technical assistance in developing formal CBA procedures.
:::
## Addressing the issues of Pareto Efficiency
### The Hicks-Kaldor Compensation Principle
:::{.callout-note}
## Definition
An allocative (i) change (ii) increases efficiency if the gainers from the change are (iii)
capable of compensating the losers and still coming out ahead.
Each individual’s gain or loss is defined as the value of a hypothetical monetary
compensation that would keep each individual (in his or her own judgement) indifferent to
the change
Cost-benefit analysis examines whether policy changes satisfy the compensation principle or
not
:::
::: {.notes}
Now, let's explore the evolution of the principles underlying Cost-Benefit Analysis. Vilfredo Pareto played a crucial role by introducing the Pareto principle, which stated that a project is desirable if it makes at least one person better off without making anyone else worse off. While this principle laid the foundation for CBA, it was soon recognized as being too stringent in practice.
This led to the introduction of compensation tests by economists like J.R. Kaldor and Nicholas Hicks. Compensation tests proposed that a project is desirable if its beneficiaries are enriched enough that they could overcompensate those who are hurt by the project. These tests significantly expanded the range of projects that could be evaluated using CBA and ultimately became the basis for modern CBA.
:::
## Addressing the issues of Pareto Efficiency {background-image="images/Screenshot 2023-11-02 at 09.04.37.png" background-size="50%"}
### The Hicks-Kaldor Compensation Principle
::: {.notes}
Now, let's explore the evolution of the principles underlying Cost-Benefit Analysis. Vilfredo Pareto played a crucial role by introducing the Pareto principle, which stated that a project is desirable if it makes at least one person better off without making anyone else worse off. While this principle laid the foundation for CBA, it was soon recognized as being too stringent in practice.
This led to the introduction of compensation tests by economists like J.R. Kaldor and Nicholas Hicks. Compensation tests proposed that a project is desirable if its beneficiaries are enriched enough that they could overcompensate those who are hurt by the project. These tests significantly expanded the range of projects that could be evaluated using CBA and ultimately became the basis for modern CBA.
:::
## CBA in Practice
### CBA Popularity and Doubts in the 1960s and 1970s
::: incremental
- In the 1960s, Cost-Benefit Analysis (CBA) gained popularity, even though there was no clear consensus on its theoretical foundation.
- Government agencies and applied economists embraced CBA during this period.
- However, by the 1970s, doubts started to emerge regarding the utility of CBA, both theoretically and practically.
- These doubts were not just theoretical but also related to challenges and criticisms in applying CBA to real-world decision-making.
:::
## CBA in Practise
### The **Real** Practice of CBA
::: incremental
- While CBA is taught in textbooks with specific methodologies, its practical application in government agencies often differs.
- Agencies may adapt CBA to their specific needs, using it as a tool to rationalize decisions made for various reasons, including political and administrative considerations.
- It's not uncommon for agencies to deviate from standard CBA procedures without always providing a clear rationale.
- The actual practice of CBA can be influenced by external factors such as legal constraints, data availability, and practical limitations.
:::
# CBA in the Green Book {background-color="#40666e"}
## CBA in the Green Book
### The Appraisal Process
::: incremental
1) Define the Problem
2) Establish Objectives
3) Identify Options
4) Appraise Options
5) Sensitivity Analysis
6) Decision Making
7) Implementation and Monitoring
:::
::: {.notes}
Introduction
- The Green Book is the UK Government's guide for conducting appraisals of public sector projects and policies.
- The appraisal process is designed to assess the economic, financial, social, and environmental impacts of proposed initiatives.
Key Steps in Appraisal
1. Define the Problem
- Clearly articulate the problem that the policy or project intends to address.
- Understand the underlying causes and set objectives.
2. Establish Objectives
- Define specific, measurable, and achievable objectives for each option.
- Objectives should cover economic, social, and environmental aspects.
3. Identify Options
- Develop a range of possible options to address the problem.
- Include a "do-nothing" option as a baseline for comparison.
4. Appraise Options
- Conduct a rigorous assessment of each option, including a detailed cost-benefit analysis.
- Consider the present value of costs and benefits over a specified timeframe.
- Use appropriate discount rates to account for the time value of money.
5. Sensitivity Analysis
- Test the robustness of the appraisal by varying key assumptions and parameters.
- Assess how changes in inputs impact the results.
6. Decision Making
- Compare the outcomes of the different options, considering not only financial but also social and environmental impacts.
- Make informed decisions based on the appraisal results.
7. Implementation and Monitoring
- Once a decision is made, implement the chosen option.
- Establish a monitoring framework to track the actual performance against the expected outcomes.
Conclusion
- The Green Book's appraisal process ensures that public sector decisions are based on a comprehensive and systematic evaluation of options.
- It promotes transparency, accountability, and evidence-based policymaking.
:::
## CBA in the Green Book
### Sensitivity Analysis
::: {.callout-note}
## Definition
- Sensitivity analysis explores how the expected outcomes of an intervention are sensitive to variations in key input variables.
- It helps understand the impact of changing assumptions on project feasibility and preferred options.
:::
A key concept is the [Switching Value:]{.bg style="--col: #e64173"} The value at which a key input variable would need to change to switch from a recommended option to another or for a proposal not to receive funding.
Identifying switching values is crucial to decision-making.
## CBA in the Green Book
### Sensitivity Analysis
| Variable | Value |
|---------------------------------------|---------------------------|
| Site area | 39 acre |
| Existing use land value estimate | £30,659 per acre |
| Future use land value estimate | £200,000 per acre |
| Land value uplift per acre | £169,341 per acre |
| Total land value uplift | £6.6m |
| Wider social benefits | £1.4m |
| Present Value Benefits (PVB) | £8m |
| Present Value Cost (PVC) | £10m |
| Benefit Cost Ratio (BCR = PVB / PVC) | 0.8 |
| Net Present Social Value (NPSV) | -£2m |
::: {.notes}
Officials are appraising the treatment of a 39 acre contaminated land site, to be funded by a public sector grant. The remediation of the land would enable new businesses to move close to an existing cluster of businesses in a highly productive sector. The benefits of the intervention can be estimated by the change in the land value of the site (land value uplift). There is data on the current value and likely value of the land post remediation.
The total benefits are £8m when wider social benefits are added to the increase in land value as a result of the remediation. The costs of the remediation exceed the benefits so the BCR is less than 1 and the NPSV is negative. The switching value to turn the NPSV positive, so benefits outweigh costs, would be an approximate future land use value of £251,000 per acre equal to a land value uplift of approximately £221,000 per acre.
:::
## CBA in the Green Book
### Optimism Bias
::: {.callout-note}
## Definition
Optimism bias is the demonstrated systematic tendency for appraisers to be over-optimistic about key project parameters, including capital costs, operating costs, project duration and benefits delivery.
:::
::: incremental
- Adjust for optimism bias to provide a realistic assessment of project estimates.
- Adjustments should align with risk avoidance and mitigation measures, with robust evidence required before reductions.
- Apply optimism bias adjustments to operating and capital costs. Use confidence intervals for key input variables when typical bias measurements are unavailable.
:::
## CBA in the Green Book
### Monte Carlo Analysis
::: {.callout-note}
*Monte Carlo analysis is a simulation-based risk modelling technique that produces expected values and confidence intervals. The outputs are the result of many simulations that model the collective impact of a number of uncertainties.*
*It is useful when there are a number of variables with significant uncertainties, which have known, or reasonably estimated, independent probability distributions.*
*It requires a well estimated model of the likely impacts of an intervention and expert professional input from an operational researcher, statistician, econometrician, or other experienced practitioner.*
:::
::: {.notes}
The technique is useful where variations in key inputs are expected and where they are associated with significant levels of risk mitigation costs, such as flood prevention. This can be used to determine what level of investment might be required to deal with extreme events such as rainfall events, which will have a statistical likelihood.
:::
# Monte Carlo Simulations for CBA {background-color="#40666e"}
## Monte Carlo Simulations for CBA
### Data and Setup
| project_id | low | central | high |
|:---------:|:-------:|:-------:|:-------:|
| 1 | 64.37888| 159.9989| 223.8726|
| 2 | 89.41526| 133.2824| 296.2359|
| 3 | 70.44885| 148.8613| 260.1366|
| 4 | 94.15087| 195.4474| 424.4830|
| 5 | 97.02336| 148.2902| 471.6297|
| 6 | 52.27782| 189.0350| 288.0247|
| 7 | 76.40527| 191.4438| 236.4092|
| 8 | 94.62095| 160.8735| 684.4839|
| ... | ...| ...| ...|
## Monte Carlo Simulations for CBA
### Data and Setup
::: incremental
- **Objective**: Create functions to generate different cost distributions based on user-specified parameters.
- **Process**:
- Each function generates a sequence of possible project-level costs based on user-defined "high" and "low" values.
- Depending on the chosen distribution assumption, a probability distribution function is applied to create a vector of probabilities.
- The `sample()` function is used to randomly sample cost values from the sequence, with replacement, using the assumed probability distribution.
:::
## Monte Carlo Simulations for CBA
### Data and Setup
::: incremental
- **Total Cost Distributions**:
- These functions are applied to the project dataset to calculate total costs.
- The result is a vector of possible total project costs that can be plotted as a distribution.
- This approach allows for the exploration of different cost scenarios and provides a basis for risk analysis in project management.
:::
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 1) Uniform Distribution
Project costs are modeled using a uniform distribution spanning low to
high.
``` r
uniform_1 <- function(low, high){
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 1) Uniform Distribution
Project costs are modeled using a uniform distribution spanning low to
high.
``` r
uniform_1 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 1) Uniform Distribution
Project costs are modeled using a uniform distribution spanning low to
high.
``` r
uniform_1 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Uniform Probability distribution function
distribution <- dunif(sequence, min = low, max = high)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 1) Uniform Distribution
Project costs are modeled using a uniform distribution spanning low to
high.
``` r
uniform_1 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Uniform Probability distribution function
distribution <- dunif(sequence, min = low, max = high)
# Sampling from possible costs using the assumed distribution function
sample(x = sequence, size = 10000, replace = T, prob = distribution)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 1) Uniform Distribution
```{r}
#| echo: false
uniform_1(low = data$low[1], high = data$high[1]) %>%
as.data.frame() %>%
ggplot() +
geom_density(aes(x = .), lwd = 1.5) +
theme_minimal() +
labs(title = "Project Cost Distribution",
subtitle = "Uniform",
y = "Likelihood",
x = "Total Cost (£)") +
geom_vline(aes(xintercept = sum(data$central[1])), color = "red") +
geom_vline(aes(xintercept = sum(data$low[1])), color = "blue") +
geom_vline(aes(xintercept = sum(data$high[1])), color = "blue")
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 1) Uniform Distribution
```{r}
#| echo: false
mapply(uniform_1, data$low, data$high) %>%
rowSums() %>%
as.data.frame() %>%
ggplot() +
theme_minimal() +
geom_density(aes(x = .)) +
labs(title = "Cost Distribution",
subtitle = "Uniform",
y = "Likelihood",
x = "Total Cost (£)") +
geom_vline(aes(xintercept = sum(data$central)), color = "red") +
geom_vline(aes(xintercept = sum(data$low)), color = "blue") +
geom_vline(aes(xintercept = sum(data$high)), color = "blue")
```
::: {.notes}
Applying the function to the data and finding the sum of each row gives the total cos across 10000 different simulations.
This provides a normally distributed cost estimate at due to the central limit theorem.
As the only parameters used to model the distribution of project costs were the high and low estimates, this means that the total cost does
not represent any skew caused by the central estimate.
:::
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
Project costs are modeled using a normal distribution with a mean defined as the midpoint between high and low, and a standard deviationthat is 1/4 of the distance between high and low.
This means that, if the data is truly normally distributed, then the low and high estimates represent the 95% confidence interval for an individual project's cost.
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
This function looks like:
``` r
normal_2 <- function(low, high){
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
This function looks like:
``` r
normal_2 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
This function looks like:
``` r
normal_2 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the midpoint between low and high
mean_x = (high-low)/2+low
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
This function looks like:
``` r
normal_2 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the midpoint between low and high
mean_x = (high-low)/2+low
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
This function looks like:
``` r
normal_2 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the midpoint between low and high
mean_x = (high-low)/2+low
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
This function looks like:
``` r
normal_2 <- function(low, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the midpoint between low and high
mean_x = (high-low)/2+low
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
# Sampling from possible costs using the assumed distribution function
sample(x = sequence, size = 10000, replace = T, prob = distribution)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
```{r}
#| echo: false
normal_2(low = data$low[1], high = data$high[1]) %>%
as.data.frame() %>%
ggplot() +
geom_density(aes(x = .)) +
theme_minimal() +
labs(title = "Project Cost Distribution",
subtitle = "Normal (without central)",
y = "Likelihood",
x = "Total Cost (£)") +
geom_vline(aes(xintercept = sum(data$central[1])), color = "red")
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 2) Normal Distribution (without a central estimate)
```{r}
#| echo: false
mapply(normal_2, data$low, data$high) %>%
rowSums() %>%
as.data.frame() %>%
ggplot() +
geom_density(aes(x = .)) +
theme_minimal() +
labs(title = "Total Cost Distribution",
subtitle = "Normal (without central)",
y = "Likelihood",
x = "Total Cost (£)") +
geom_vline(aes(xintercept = sum(data$central)), color = "red") +
geom_vline(aes(xintercept = sum(data$low)), color = "blue") +
geom_vline(aes(xintercept = sum(data$high)), color = "blue")
```
::: {.notes}
This provides a normally distributed total cost estimate which is tighter than if sampled from a set of uniformly distributed project level costs.
Again, this does not involved the central cost estimate therefore gives a normal distribution which is centered around the mid point between low and high, but with a 90% confidence interval smaller than example 1)
:::
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
As before, except the mean of the normal distribution is assumed to be the central value.
``` r
normal_3 <- function(low, central, high){
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
As before, except the mean of the normal distribution is assumed to be the central value.
``` r
normal_3 <- function(low, central, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
As before, except the mean of the normal distribution is assumed to be the central value.
``` r
normal_3 <- function(low, central, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the central project cost estimate
mean_x = central
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
As before, except the mean of the normal distribution is assumed to be the central value.
``` r
normal_3 <- function(low, central, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the central project cost estimate
mean_x = central
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
As before, except the mean of the normal distribution is assumed to be the central value.
``` r
normal_3 <- function(low, central, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the central project cost estimate
mean_x = central
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
As before, except the mean of the normal distribution is assumed to be the central value.
``` r
normal_3 <- function(low, central, high){
# Set of possible costs
sequence <- seq(from = 0, to = sum(data$high), by = 1)
# Mean equal to the central project cost estimate
mean_x = central
# Standard Deviation equal to 1/4 of the distance between low and high
sd_x = (high-low)/4
# Normal Probability Distribution Function
distribution <- dnorm(sequence, mean = mean_x, sd = sd_x)
# Sampling from possible costs using the assumed distribution function
sample(x = sequence, size = 10000, replace = T, prob = distribution)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
```{r}
#| echo: false
normal_3(low = data$low[1],
central = data$central[1],
high = data$high[1]) %>%
as.data.frame() %>%
ggplot() +
geom_density(aes(x = .)) +
theme_minimal() +
labs(title = "Project Cost Distribution",
subtitle = "Normal (with central)",
y = "Likelihood",
x = "Total Cost (£)") +
geom_vline(aes(xintercept = sum(data$central[1])), color = "red") +
geom_vline(aes(xintercept = sum(data$low[1])), color = "blue") +
geom_vline(aes(xintercept = sum(data$high[1])), color = "blue")
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 3) Normal Distribution (including a central estimate)
```{r}
#| echo: false
mapply(normal_3, data$low, data$central, data$high) %>%
rowSums() %>%
as.data.frame() %>%
ggplot() +
geom_density(aes(x = .)) +
theme_minimal() +
labs(title = "Total Cost Distribution",
subtitle = "Normal (with central)",
y = "Likelihood",
x = "Total Cost (£)") +
geom_vline(aes(xintercept = sum(data$central)), color = "red") +
geom_vline(aes(xintercept = sum(data$low)), color = "blue") +
geom_vline(aes(xintercept = sum(data$high)), color = "blue")
```
::: {.notes}
This provides a normally distributed total cost estimate which is anchored to the central cost estimate.
Due to the central limit theorem, this is still a symmetric cost distribution and treats low and high estimates as cost limits.
:::
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
::: incremental
- Are costs and benefits really normally distributed?
- By definition, they can only be positive.
- But the upper limit could be infinite?
- What is the *real* benefit of Net Zero e.g, the existence of the human race?
- Similarly, what would be the cost of a race of hostile aliens enslaving humanity?
- In either case - probably a lot!
:::
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
- Are costs and benefits really normally distributed?
- By definition, they can only be positive.
- But the upper limit could be infinite?
- What is the *real* benefit of Net Zero e.g, the existence of the human race?
- Similarly, what would be the cost of a race of hostile aliens enslaving humanity?
- In either case - probably a lot!
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
::: {.callout-note}
# A solution
The Log-Normal distribution allows for a right skew and long upper tail while using the same input parameters as a normal distribution.
:::
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
In the context of cost estimation for a project, we can leverage the Cumulative Density Function (CDF) of the Log-Normal distribution to calculate the mu (μ) and sigma (σ) parameters required to achieve a distribution where approximately 95% of estimates fall between the low and high cost estimates.
To achieve this, we need to establish a relationship between our central project cost estimate and the relevant formula. However, this approach relies on an assumption about what the central estimate represents.
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
::: {callout-note}
One potential statistic that relates our three project cost estimates to the distribution parameters is the mode.
Assuming that the central cost estimate represents the most likely outcome, it corresponds to the peak of the probability distribution, making it the mode.
:::
The mode of the Log-Normal distribution is given by the formula:
$$mode = e^{\mu - \sigma^2} = central$$
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
Solving for mu (μ) gives us:
$$\mu = \log(mode) + \sigma^2 = \log(central) + \sigma^2$$
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
::: incremental
- Therefore, we need to find the value of sigma (σ) that results in approximately 95% of our project cost estimates falling between the high cost and low cost estimates.
- This can be calculated by finding the difference between the Log-Normal CDF evaluated at the high cost estimate and the low cost estimate.
- For a practical illustration, we can utilize the data from the first project.
:::
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
First defining an open function
``` r
f <- function(sigma){
# The relationship between mode (central), mu and sigma
mu <- log(data$central[1]) + sigma^2
# The difference between the CDF at high and CDF at low where 95%
# of estimates fall
abs(plnorm(data$high[1], mu, sigma) - plnorm(data$low[1], mu, sigma) - 0.95)
}
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
``` r
# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
``` r
# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
``` r
# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2)
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
``` r
# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2)
# Now using these to simulate a distribution
N <- 10000000
nums <- rlnorm(N, mu_test, sigma_test)
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
``` r
# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2)
# Now using these to simulate a distribution
N <- 10000000
nums <- rlnorm(N, mu_test, sigma_test)
# Now testing how many values lie between Low and High
sum(data$low[1] < nums & nums < data$high[1]) / N
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
``` r
# Next using optimize to search the interval from lower to upper for a
# minimum of the function f with respect to the first argument, sigma.
optimize(f, lower = 0, upper = 1)
# Selecting the minimum from the tibble, this is the optimal sigma
sigma_test <- optimize(f, lower = 0, upper = 1)$minimum
# Plugging this back into the formula for the mean
mu_test <- (log(data$central[1]) + sigma_test^2)
# Now using these to simulate a distribution
N <- 10000000
nums <- rlnorm(N, mu_test, sigma_test)
# Now testing how many values lie between Low and High
sum(data$low[1] < nums & nums < data$high[1]) / N
```
``` {r}
sum(data$low[1] < nums & nums < data$high[1]) / N
```
## Monte Carlo Simulations for CBA {auto-animate="true"}
### 4) Log-Normal Distribution
```{r}
#| echo: false
log_normal_4(low = data$low[1],
central = data$central[1],
high = data$high[1]) %>%
as.data.frame() %>%
ggplot() +
geom_density(aes(x = .), lwd=1.5) +
theme_minimal() +
labs(title = "Project Cost Distribution",
subtitle = "Log-Normal",
y = "Likelihood",
x = "Total Cost") +
geom_vline(aes(xintercept = sum(data$central[1])), color = "red") +