forked from saundersg/BYUI_M221_Book_R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lesson18.Rmd
958 lines (695 loc) · 39.1 KB
/
Lesson18.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
---
title: "Lesson 18: Inference for Two Proportions"
output:
html_document:
theme: cerulean
toc: true
toc_float: false
---
<script type="text/javascript">
function showhide(id) {
var e = document.getElementById(id);
e.style.display = (e.style.display == 'block') ? 'none' : 'block';
}
</script>
<div style="width:50%;float:right;">
#### Optional Videos for this Lesson {.tabset .tabset-pills}
##### Part 1
<iframe id="kaltura_player_1652884135" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1652884135&entry_id=1_bjjaoc4w" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 2
<iframe id="kaltura_player_1652884583" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1652884583&entry_id=1_17yv0ikl" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 3
<iframe id="kaltura_player_1652884921" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1652884921&entry_id=1_nl5guar5" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 4
<iframe id="kaltura_player_1652885278" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1652885278&entry_id=1_21835n7j" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 5
<iframe id="kaltura_player_1652885689" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1652885689&entry_id=1_3aq5pv3j" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
</div><div style="clear:both;"></div>
## Lesson Outcomes
By the end of this lesson, you should be able to do the following.
**Regarding Confidence Intervals for a comparison of two proportions: **
* Calculate and interpret a confidence interval for a comparison of two proportions given a confidence level.
* Identify a point estimate and margin of error for the confidence interval.
* Show the appropriate connections between the numerical and graphical summaries that support the confidence interval.
* Check the requirements for the confidence interval.
**Regarding Hypothesis Testing for a comparison of two proportions:**
* State the null and alternative hypothesis for the chosen test.
* Calculate the test-statistic and p-value of the hypothesis test.
* Assess the statistical significance by comparing the p-value to the α-level.
* Check the requirements for the hypothesis test.
* Show the appropriate connections between the numerical and graphical summaries that support the hypothesis test.
* Draw a correct conclusion for the hypothesis test.
<br>
## Hypothesis Tests
<img src="./Images/StepsAll.png">
### Another Taste of PTC
<img src="./Images/Step1.png">
<img src="./Images/Phenylthiocarbamide-3D-balls.png">
The ability to taste the chemical Phenylthiocarbamide (PTC) is hereditary. Some people can taste it, while others cannot.
Even though the ability to taste PTC was observed in all age, race, and sex groups, this does not address the issue about whether men or women are more likely to be able taste PTC.
Further exploration of the PTC data allows us to investigate if there is a difference in the proportion of men and women who can taste PTC. The following contingency table summarizes Elise Johnson's results<!--<cite>Johnson06</cite>-->:
Can Taste PTC? Female Male Total
---------------- -------- ------ -------
No 15 14 29
Yes 51 38 89
Total 66 52 118
: **Gender Data Table**
<!-- {| class="basic" style="text-align:center" -->
<!-- |+ **Gender Data Table** -->
<!-- |- -->
<!-- ! Can Taste PTC? || Female || Male || Total -->
<!-- |- -->
<!-- ! No -->
<!-- | 15 || 14 || 29 -->
<!-- |- -->
<!-- ! Yes -->
<!-- || 51 || 38 || 89 -->
<!-- |- -->
<!-- ! Total -->
<!-- | 66 || 52 || 118 -->
<!-- |- -->
<!-- |} -->
These data are available in the file [PTCTasting](./Data/PTCtasting.xlsx). Note the way the data are organized in the file. One column gives the gender, another column indicates if the individual can taste PTC, and a third column gives counts for each group.
Researchers want to know if the ability to taste PTC is a sex-linked trait. This can be summarized in the following research question: Is there a difference in the proportion of men and the proportion of women who can taste PTC? The hypothesis is that there is no difference in the the true proportion of men who can taste PTC compared to the true proportion of women who can taste PTC.
<img src="./Images/Step2.png">
A sample of 66 females and 52 males were provided with PTC strips and asked to indicate if they could taste the chemical or not. (This research was approved by the BYU-Idaho Institutional Review Board.)
<img src="./Images/Step3.png">
When working with categorical data, it is natural to summarize the data by computing proportions. If someone has the ability to taste PTC, we will call this a success. The sample proportion is defined as the number of successes observed divided by the total number of observations. For the females, the proportion of the sample who could taste the PTC was:
$$
\hat p_1 = \frac{x_1}{n_1} = \frac{51}{66}
$$
This is approximately 77.3% of the people who were surveyed. For the males, the proportion who could taste PTC was:
$$
\hat p_2 = \frac{x_2}{n_2} = \frac{38}{52}
$$
This works out to be about 73.1%.
When working with data for two proportions, graphically displaying the data can help us compare each proportion. Pie charts and bar charts are essential tools for describing our data. The Math221 Statistics Toolbox automatically creates a side-by-side 100% stacked bar chart when you input the data into the "Two Proportions" tab.
<img src="./Images/PTC_Stacked_Bar_Toolbox.PNG">
Click on the link at right if you would like to learn how to create a side-by-side 100% stacked bar chart on your own in Excel.
<a href="javascript:showhide('ins')"><span style="font-size:8pt;">Show/Hide Instructions</span></a>
<div id="ins" style="display:none;">
<!-- To access this content, scroll to the bottom of the editing page and click on the link "Software:(Excel or SPSS)-(PageName)" -->
<!-- {{Software_Filter | PageName=SideBySidePieCharts}} -->
<div class="SoftwareHeading">Excel Instructions</div>
<div class="Summary">
**To create 100% Stacked Bar Charts in Excel, do the following:**
For this example we will continue to use the PTCTasting data set.
- First, re-arrange the summary of the data so that it looks like the table above, but without a row and column for total.
- Highlight the counts and the row and column headings. Now click on the "Insert" tab and select the 3rd option under the "2D Column" heading.
<center>
<img src="./Images/PTC_Stacked_Bar_Creation.PNG">
</center>
<br>
- Your graph would look like the image below. You can further customize it by adding data labels, titles, changing colors, etc.
<center>
<img src="./Images/PTC_Stacked_bar.PNG">
</center>
<br>
</div>
<br>
</div>
<br>
<img src="./Images/Step4.png">
The null and alternative hypotheses for a test of equality of two proportions is:
$$
\begin{array}{rl}
H_0: & p_1 = p_2 \\
H_a: & p_1 \ne p_2 \\
\end{array}
$$
If the null hypothesis is true, then the proportion of females who can taste PTC is the same as the proportion of males who can taste PTC.
<span id="Equation"></span>
The test statistic is a $z$, and is given by:
$$
z = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } }
$$
where
$$
\begin{array}{lll}
n_1= \text{sample size for group 1:} & n_1 = 66 & \text{(number of females)} \\
n_2= \text{sample size for group 2:} & n_2 = 52 & \text{(number of males)} \\
\hat p_1= \text{sample proportion for group 1:} & \hat p_1 = \frac{x_1}{n_1} = \frac{51}{66} & \text{(proportion of females who can taste PTC)}\\
\hat p_2= \text{sample proportion for group 2:} ~ & \hat p_2 = \frac{x_2}{n_2} = \frac{38}{52} & \text{(proportion of males who can taste PTC)}\\
\hat p= \text{overall sample proportion:} & \hat p = \frac{x_1+x_2}{n_1+n_2} = \frac{89}{118} & \text{(overall proportion who can taste PTC)}\\
\end{array}
$$
Substituting these values into the equation for the test statistic, $z$, we get:
$$
\begin{align}
z & = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } }
\\
& = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( 0 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \\
& ~ ~ ~ ~ ~ \textrm{In the null hypothesis, we assumed that} ~ p_1=p_2. \\
& ~ ~ ~ ~ ~ \textrm{Or after subtracting,} ~ p_1-p_2=0 \\
& ~ ~ ~ ~ ~ \textrm{So, we substituted} ~ 0 ~ \textrm{for} ~ p_1-p_2 ~ \text{in the previous step.} \\
& = \frac{ \left( \frac{51}{66} - \frac{38}{52} \right) - (0) }{ \sqrt{\frac{89}{118} \left( 1-\frac{89}{118} \right) \left( \frac{1}{66} + \frac{1}{52} \right) } } \\
& = 0.526 \\
\end{align}
$$
The test statistic is $z=0.526$. Under the null hypothesis, this follows a standard normal distribution. So, we can use the Normal Probability applet to compute the $P$-value. We are conducting a two-sided test, so we will shade both tails in the applet.
<center>
<img src="./Images/ShadeBothZ-0-5256.png">
</center>
Since $P\textrm{-value} = 0.599 > 0.05 = \alpha$, we fail to reject the null hypothesis. In English we say, there is insufficient evidence to suggest that the true proportion of males who can taste PTC is different from the true proportion of females who can taste PTC.
<img src="./Images/Step5.png">
Men and women appear to be able to taste PTC in equal proportions. There is not enough evidence to say that one gender is able to taste PTC more than the other.
It appears that the ability to taste PTC is not a sex-linked trait.
### Using Excel to perform these calculations
<!-- {{Content:Excel/SPSS-Two Proportion HT}} -->
Just like we did for one proportion, we will use the [Math 221 Statistics Toolbox](./Data/Math221StatisticsToolbox.xltx) to perform hypothesis tests for two proportions.
<span class="Custom">Click on the link at right</span> for instructions on using this spreadsheet to perform hypothesis testing.
<a href="javascript:showhide('excelinstructions')"><span style="font-size:8pt;">Show/Hide Instructions</span></a>
<div id="excelinstructions" style="display:none;">
For this example we will consider the "PTC" data above.
- **Step 1**: Open the Excel file Math221 Statistics Toolbox and click on the "Two Proportions" tab at the bottom of the spreadsheet. Find the group of cells labeled "Numerical Summary". Here we will input the values of $x_1$, $n_1$, $x_2$, and $n_2$ from the PTC tasting data above.
<center>
<img src="./Images/Lesson_18_pic_1.PNG">
</center>
- The yellow boxes indicate the input spaces. These are the only cells into which you will enter data.
- **Confidence Interval**: Next, we indicate our desired level of confidence for our confidence interval. Down and to the right of where we entered in our counts is a block of cells labeled "Confidence Interval". There is a yellow cell there, I12, where we can choose a confidence level. By default the cell contains the value 95%. The confidence interval bounds are shown as well as the sample proportion, $\hat{p}$ and the margin of error.
<center>
<img src="./Images/Lesson_18_pic_2.PNG">
</center>
<br>
- **Hypothesis Test**: To conduct a hypothesis test we must define our null and alternative hypotheses. To the right of the "Numerical Summary" cells is a block of cells labeled "Hypothesis Test". The null hypothesis is always the same and does not need to be updated. For the alternative hypothesis we can change the yellow cell, K6, to select the type of alternative hypothesis we wish to test, i.e. "Greater Than", "Less Than", or "Not Equal To."
The results of the hypothesis test are given immediately below where we define the hypotheses:
<center>
<img src="./Images/Lesson_18_pic_3.PNG">
</center>
<br>
- Compare the z-score and $P$-value with the one you calculated by hand. They're the same!
<br>
</div>
<br>
<img src="./Images/StepsAll.png">
### Mortality Rates and Day of Admission: Aortic Aneurysms
Some people have claimed that mortality (death) rates are higher for patients admitted to a hospital on a weekend compared to patients admitted on a weekday. Researchers Chaim Bell and Donald Redelmeier analyzed
admission data from hospital emergency rooms in Ontario, Canada <!--<cite>Bell01</cite>-->.
<center>
<img src="./Images/AneurysmAorta-marked.png">
</center>
The aorta is a major artery that takes oxygen-rich blood from the heart to the entire body. In some patients, this artery can swell like a balloon and burst. If this occurs in the abdomen, the technical term for the event is a ruptured abdominal aortic aneurysm. Although this condition is treatable, it requires immediate action, or the patient will die rapidly.
<img src="./Images/Step1.png">
The problem is that the quality of care in an emergency care facility may differ at different times of the week. Doctors Bell and Redelmeier hypothesized that the probability that a patient with an aortic aneurysm will die is greater if they are admitted to a hospital on a weekend compared to a weekday.
**Hypothesis:**
The proportion of patients with a ruptured abdominal aortic aneurysm who will die is greater on the weekends than on weekdays.
<img src="./Images/Step2.png">
To test this claim, the researchers accessed medical records for several patients admitted to the emergency department of the hospitals in Ontario, Canada. They recorded the number of patients admitted with an aortic aneurysm on weekdays compared to weekends.
Data representative of their results are given below <!--<cite>Bell01</cite>-->.
Outcome Weekday Admission Weekend Admission
----------- ------------------- -------------------
Died (x) $x_1 = 1476$ $x_2 = 553$
Survived $2669$ $756$
Total (n) $n_1 = 4145$ $n_2 = 1309$
: **Aortic Aneurysm Outcomes**
<!-- {| class="basic" style="text-align:center" -->
<!-- |+ **Aortic Aneurysm Outcomes** -->
<!-- |- -->
<!-- ! Outcome || Weekday Admission || Weekend Admission -->
<!-- |- -->
<!-- ! Died (x) -->
<!-- | align="right" | $x_1 = 1476$ -->
<!-- | align="right" | $x_2 = 553$ -->
<!-- |- -->
<!-- ! Survived -->
<!-- | align="right" | $2669$ -->
<!-- | align="right" | $756$ -->
<!-- |- -->
<!-- ! Total (n) -->
<!-- | align="right" | $n_1 = 4145$ -->
<!-- | align="right" | $n_2 = 1309$ -->
<!-- |- -->
<!-- |} -->
<img src="./Images/Step3.png">
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
1. Use the data above to find the estimated proportion of patients admitted with an aortic aneurysm on a weekday who will die, $\hat p_1$.
<a href="javascript:showhide('Q1')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q1" style="display:none;">
<center>$\hat p_1 = 0.3561$</center>
</div>
<br>
2. Use the data to compute the estimated proportion of the patients admitted on a weekend that will die, $\hat p_2$.
<a href="javascript:showhide('Q2')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q2" style="display:none;">
<center>$\hat p_2 = 0.4225$</center>
</div>
<br>
3. What do you notice about $\hat p_1$ and $\hat p_2$?
<a href="javascript:showhide('Q3')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q3" style="display:none;">
<center>$\hat p_2~>~\hat p_1$ </center>
</div>
<br>
4. Without doing any more calculations, do you think that there is a significant increase in the death rates of patients admitted on a weekend compared to those admitted on a weekday? Justify your answer.
<a href="javascript:showhide('Q4')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q4" style="display:none;">
- It appears that those admitted on a weekend have a greater death rate than those who are admitted on a weekday, but we do not know if it is a statistically significant difference until we do a hypothesis test.
</div>
</div>
<br>
<img src="./Images/Lesson18_aortic_bar_chart.PNG">
<br>
<!-- #### Side-by-side Bar Charts -->
<!-- If the data are considered counts, then a side-by-side bar chart is usually the preferred plot. -->
<!-- <!-- To access this content, scroll to the bottom of the editing page and click on the link "Software:(Excel or SPSS)-(PageName)" --> -->
<!-- <!-- {{Software_Filter | PageName=SideBySideBarCharts }} --> -->
<!-- <div class="SoftwareHeading">Excel Instructions</div> -->
<!-- <div class="Summary"> -->
<!-- **To create side-by-side bar charts in Excel, do the following:** -->
<!-- - First, make sure the data are organized so that the first column gives the outcome (died or survived), the next column indicates the day of admittance (weekday or weekend) and the last column shows the counts. -->
<!-- - Then highlight the all three columns but just for the "Weekday" data. -->
<!-- - Now click on the "Insert" tab and then the "Chart" button, and select "column". Please select the most basic 2D column chart. -->
<!-- - Next you can repeat the process above but highlighting the "Weekend" data for all three columns. -->
<!-- - To keep the two pie charts straight, you can add a title. If "Chart Title" is already displayed on the graph, simply double-click it to type in a new title. If you need to add a title, click on the "Design" tab under Chart Tools. Next click on "Add Chart Element" and choose "Chart Title" from the list of options. Select "Above Chart" as the location of the title. -->
<!-- - You can now type in the Titles, "Aortic Aneurysm Outcomes Weekday Admittance" and "Aortic Aneurysm Outcomes Weekend Admittance" or something similar. -->
<!-- <br> -->
<!-- </div> -->
<!-- <br> -->
<img src="./Images/Step4.png">
<br>
<div class="message Tip">$p_1$ is the true proportion of deaths in group 1, the weekday group. $p_2$ is the true proportion of deaths in the weekend group.</div>
<br>
<br>
We now conduct a formal hypothesis test to determine if the mortality (death) rate is greater on a weekend compared to a weekday. First, we state the null and alternative hypotheses:
$$
\begin{align}
H_0: & p_1 = p_2 \\
H_a: & p_1 < p_2
\end{align}
$$
where group 1 represents the patients admitted on a weekday and group 2 represents patients admitted on a weekend. Note that if $p_1 < p_2$, then the risk of death is greater in group 2 than in group 1. We will use the 0.05 level of significance.
#### Checking Requirements for the Hypothesis Test
<br>
<div class="message Tip">When you check the requirements for this procedure, you are actually checking that you have at least 10 successes and 10 failures in both Group 1 and Group 2. All four conditions must be true in order to conduct the test.</div>
<br>
<br>
If the sample size is large in both groups, then we can use the normal distribution to compute the $P$-value. To check if the sample size is large enough, we need to check the following requirements:
$$
\begin{array}{rrr}
n_1 \cdot \hat p_1 \ge 10 && n_2 \cdot \hat p_2 \ge 10 \\
n_1 \cdot \left(1-\hat p_1\right) \ge 10 && n_2 \cdot \left(1-\hat p_2\right) \ge 10
\end{array}
$$
If these requirements are satisfied, the $z$-statistic can be used to compute to assess whether the true population proportions are equal or if the risk of death is greater on weekends.
Note that the requirements are all satisfied:
$$
\begin{array}{rr}
n_1 \cdot \hat p_1 = 4145 \cdot 0.3561 = 1476 \ge 10 & n_2 \cdot \hat p_2 = 1309 \cdot 0.4225 = 553 \ge 10 \\
n_1 \cdot \left(1-\hat p_1\right) = 4145 \cdot 0.6439 = 2669 \ge 10 & n_2 \cdot \left(1-\hat p_2\right) = 1309 \cdot 0.5775 = 756 \ge 10
\end{array}
$$
**Reminder:** $~ x_1 = 1476,~x_2 =553$, $n_1=4145,~n_2 = 1309$, $\hat p_1 = \frac{x_1}{n_1},~\hat p_2 = \frac{x_2}{n_2}$,
and $\hat p = \frac{x_1+x_2}{n_1+n_2}$.
The test statistic can be computed by following these steps:
- First, find the combined proportion of "successes." This is computed as:
$$
\hat p = \frac{x_1+x_2}{n_1+n_2} = \frac{1476+553}{4145+1309} = \frac{2029}{5454}
$$
- Next, enter the appropriate values into the equation for the $z$-score.
$$
\begin{array}{rcll}
z &=& \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \\
&=& \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( 0 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } & \text{We assumed } p_1=p_2. \\
&=& \frac{ \left( \hat p_1 - \hat p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) } } \\
&=& \frac{ \left( \frac{1476}{4145} - \frac{553}{1309} \right) }{ \sqrt{\frac{2029}{5454} \left( 1-\frac{2029}{5454} \right) \left( \frac{1}{4145} + \frac{1}{1309} \right) } } \\
&=& -4.331
\end{array}
$$
(Make sure you can get this value using your calculator.)
<div class="Emphasis">
- **Remember...**
The alternative hypothesis determines which area in the tails of the $z$-distribution will be shaded as you calculated the $P$-value.
If the alternative hypothesis is $\ldots$
- $p_1 \ne p_2$, shade both tails.
- $p_1 < p_2$, shade the left tail.
- $p_1 > p_2$, shade the right tail.
<br>
</div>
<br>
This $z$-score can be substituted into the Normal Probability applet to find the $P$-value. Since the alternative hypothesis is that $p_1 < p_2$, we consider only the area to the left of $z=-4.331$. The applet gives this area (our $P$-value) as $7.42 \times 10^{-6} = 0.00000742$.
<center>
<img src="./Images/ShadeLeftZ-4-331.png">
</center>
Note that $P$-value$= 0.00000742 < 0.05 = \alpha$, so we reject the null hypothesis.
<img src="./Images/Step5.png">
There is sufficient evidence to suggest that the true proportion of patients who die from an aortic aneurysm is greater on the weekends than on the weekdays. It appears that there are substantial differences in the quality of care available to patients on the weekends compared to patients on weekdays.
<img src="./Images/StepsAll.png">
### Mortality Rates and Day of Admission: Heart Attacks
<img src="./Images/Acute_myocardial_infarction_with_contraction_band_necrosis.png">
Heart attacks are a leading cause of death in many areas of the world.
The study by Doctors Bell and Redelmeier included an analysis of the risk of dying of a heart attack, after admission to a hospital. The researchers reported the following death rates, depending on whether the heart attack occurred on a weekday or a weekend <!--<cite>Bell01</cite>-->.
Outcome Weekday Admission Weekend Admission
----------- ------------------- -------------------
Died (x) 17,113 6,289
Survived 100,596 36,222
Total (n) 117,709 42,511
: **Heart Attack Outcomes**
<!-- {| class="basic" style="text-align:center" -->
<!-- |+ **Heart Attack Outcomes** -->
<!-- ! Outcome !! Weekday Admission !! Weekend Admission -->
<!-- |- -->
<!-- !Died (x) -->
<!-- | 17,113 || 6,289 -->
<!-- |- -->
<!-- !Survived -->
<!-- | 100,596 || 36,222 -->
<!-- |- -->
<!-- !Total (n) -->
<!-- | 117,709 || 42,511 -->
<!-- |- -->
<!-- |} -->
In this section, you will conduct a hypothesis test to determine if the proportion of patients who die of a heart attack is greater on weekends than on weekdays. Use the 0.05 level of significance.
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
<img src="./Images/Step1.png">
5. Summarize the relevant background information
<a href="javascript:showhide('Q5')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q5" style="display:none;">
- This was a study conducted by Doctors Bell and Redelmeier in which they analyzed the death rates depending on which day (weekday or weekend) the heart attack occurred and when the patients were admitted into the hospital.
</div>
<br>
<img src="./Images/Step2.png">
6. Describe the data collection process.
<a href="javascript:showhide('Q6')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q6" style="display:none;">
- The researchers collected data on patients admitted to hospitals for heart attacks and whether or not they died. Separate data was kept for weekday hospital admissions and weekend hospital admissions.
</div>
<br>
<img src="./Images/Step3.png">
7. The value of $\hat p_1$, the sample proportion of patients who died of a heart attack on a weekday, is $\hat p_1 = \frac{17113}{117709} = 0.14538$. Find the value of $\hat p_2$.
<a href="javascript:showhide('Q7')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q7" style="display:none;">
<center>$\displaystyle{\hat p_2 = \frac{6289}{42511}=0.14794}$</center>
</div>
<br>
8. Create a chart illustrating the data.
<a href="javascript:showhide('Q8')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q8" style="display:none;">
<img src="./Images/Lesson_18_heart_attack_weekend_graph.PNG">
</div>
<br>
9. Based on your answers to questions 7 and 8, does it appear that the risk of dying is greater if a heart attack occurs on a weekend than on a weekday?
<a href="javascript:showhide('Q9')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q9" style="display:none;">
- The sample proportions, $\hat p_1$ and $\hat p_2$ are very close. Visually, there does not appear to be a difference.
</div>
<br>
10. If the proportion of patients who die of a heart attack is greater on weekends than on weekdays, which of the following would best describe the relationship?
- A. $p_1 = p_2$
- B. $p_1 > p_2$
- C. $p_1 < p_2$
- D. $p_1 \ne p_2$
<a href="javascript:showhide('Q10')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q10" style="display:none;">
- $p_1 < p_2$ is correct. Note that group 1 is the weekday group and group 2 is the weekend group.
</div>
<br>
<img src="./Images/Step4.png">
11. Replace the circles ($\bigcirc$) in the following null and alternative hypotheses with two of the following symbols: $=><\ne$.
$$
\begin{array}{rl}
H_0: & p_1 \bigcirc p_2 \\
H_a: & p_1 \bigcirc p_2
\end{array}
$$
<a href="javascript:showhide('Q11')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q11" style="display:none;">
<center>
$$
\begin{array}{rl}
H_0: & p_1 = p_2 \\
H_a: & p_1 < p_2
\end{array}
$$
</center>
</div>
<br>
12. What are the requirements for this test?
<a href="javascript:showhide('Q12')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q12" style="display:none;">
<center>
$$
\begin{array}{rrr}
n_1 \cdot \hat p_1 \ge 10 && n_2 \cdot \hat p_2 \ge 10 \\
n_1 \cdot \left(1-\hat p_1\right) \ge 10 && n_2 \cdot \left(1-\hat p_2\right) \ge 10
\end{array}
$$
</center>
</div>
<br>
13. Are the requirements for this hypothesis test satisfied? Justify your answer.
<a href="javascript:showhide('Q13')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q13" style="display:none;">
<center>
$$
\begin{array}{rrr}
n_1 \cdot \hat p_1 = 17113 \ge 10 && n_2 \cdot \hat p_2 = 6289 \ge 10 \\
n_1 \cdot \left(1-\hat p_1\right) = 100596 \ge 10 && n_2 \cdot \left(1-\hat p_2\right) = 36222 \ge 10
\end{array}
$$
</center>
- Yes, all of the calculated quantities from the previous question are at least 10.
</div>
<br>
14. Write the equation for the test statistic, $z$.
<a href="javascript:showhide('Q14')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q14" style="display:none;">
<center>
$$
z = \frac{ \left( \hat p_1 - \hat p_2 \right) - \left( p_1 - p_2 \right) }{ \sqrt{\hat p \left( 1-\hat p \right) \left( \frac{1}{n_1} + \frac{1}{n_2} \right) }}
$$
</center>
</div>
<br>
15. Compute the value of the test statistic.
<a href="javascript:showhide('Q15')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q15" style="display:none;">
<center>$z=-1.278$</center>
</div>
<br>
16. Present a sketch of the sampling distribution, showing the test statistic and the $P$-value.
<a href="javascript:showhide('Q16')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q16" style="display:none;">
<img src="./Images/SamplingDistributionHeartAttack-Applet.png">
</div>
<br>
17. Find the $P$-value.
<a href="javascript:showhide('Q17')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q17" style="display:none;">
<center>$P\textrm{-value} = 0.1006$</center>
</div>
<br>
18. Compare the $P$-value to the level of significance. Which is smaller? Will you reject, or fail to reject, the null hypothesis?
<a href="javascript:showhide('Q18')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q18" style="display:none;">
<center>$P\textrm{-value} = 0.1006 > 0.05 = \alpha$</center>
</div>
<br>
19. What is your decision?
<a href="javascript:showhide('Q19')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q19" style="display:none;">
- Since the $P$-value is greater than $\alpha$, we fail to reject the null hypothesis.
</div>
<br>
20. Fill in the blanks in the following sentence:
- There is $\text{____________}$ evidence to suggest that the true proportion of patients who die of a heart attack is $\text{__________}$ on weekends than on weekdays.*
<a href="javascript:showhide('Q20')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q20" style="display:none;">
- There is **insufficient** evidence to suggest that the true proportion of patients who die of a heart attack is **greater** on weekends than on weekdays.
</div>
<br>
<img src="./Images/Step5.png">
21. If you were to have a heart attack, would you be more concerned if it occurred on a weekend than on a weekday?
<a href="javascript:showhide('Q21')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q21" style="display:none;">
- No, the evidence suggests that the probability of dying of a heart attack is not greater on the weekends.
</div>
</div>
<br>
<img src="./Images/StepsAll.png">
<br>
## Confidence Intervals: Managing Fox Populations
During the mid 1800's, European foxes were introduced to the Australian mainland. These predators have been responsible for the reduction or extinction of several species of native wildlife. <!--(Coates08)-->
<center>
<img src="./Images/RoyalBotanicGardensCranbourne-Flikr-flipsockgrrl.png" width=80%>
</center>
<br/>
The Royal Botanic Gardens Cranbourne is a 914 acre (370 ha) conservation reserve outside Melbourne, Australia. Predation by foxes has been an ongoing problem in the gardens. To reduce the risk to native species, a systematic program of killing foxes was implemented.
One way to monitor the presence of foxes is to look for fox tracks in specific sandy areas, called sand-pads. Before beginning a systematic effort to reduce the fox population, ecologists observed fox tracks in the sand-pads 576 out of the 950 times the sand-pads were observed. After eliminating some of the foxes, the ecologists observed fox tracks in the sand-pads 268 times out of the 1359 times they checked the sand-pads <!--<cite>Coates08</cite>-->. The ecologists want to know if there is a difference in the proportion of times fox tracks are observed before versus after the intervention to reduce the fox population.
One way to compare two proportions is to make a confidence interval for the difference in the proportions.
The equation for the confidence interval for the difference of two proportions may look a little daunting at first, but with some practice, it is not too difficult.
Before we compute the confidence interval, we first organize our data and calculate some statistics that will be useful later. We divide the data into two groups: before foxes were targeted (Group 1) and after (Group 2). For each group, let $x_1$ and $x_2$ represent the number of times fox prints were observed in the sand-pads before and after the ecologists began systematically eliminating the foxes, respectively. Similarly, Let $n_1$ and $n_2$ be the number of times the ecologists checked the sand-pads in the before and after periods, respectively.
<center>
**Fox Tracks Data**
| | Before Intervention | After Intervention | Combined Data |
|--------|---------------------|--------------------|---------------|
| **Fox Tracks Observed** | $x_1 = 576$ | $x_2 = 268$ | $x_1 + x_2 = 576 + 268 = 844$ |
| **Total Observations** | $n_1 = 950$ | $n_2 = 1359$ | $n_1 + n_2 = 950 + 1359 = 2309$ |
</center>
Similar to what we did in the lesson for [Lesson 17: Inference for One Proportion](Lesson17.html), we compute $\hat p$ for each group.
For group 1:
$$
\hat p_1 = \frac{x_1}{n_1} = \frac{576}{950}
$$
For group 2:
$$
\hat p_2 = \frac{x_2}{n_2} = \frac{268}{1359}
$$
An equation of the confidence interval for the difference between two proportions is computed by combining all the information above:
$$
\left(
\left( \hat p_1 -\hat p_2 \right)
- z^*
\sqrt{
\frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1}
+
\frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2}
}
,
~
\left( \hat p_1 -\hat p_2 \right)
+ z^*
\sqrt{
\frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1}
+
\frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2}
}
\right)
$$
<div class="note">
If you need to review how to find the value of $z^*$ for other confidence levels, see page [Lesson 10: Inference for One Mean: Sigma Known (Confidence Interval)](Lesson10.html#a-little-more-precision).
</div>
The lower bound for a 95% confidence interval for the difference of the proportions of times fox prints are observed in the sand-pads is:
$$
\displaystyle{
\left( \hat p_1 -\hat p_2 \right)
- z^*
\sqrt{
\frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1}
+
\frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2}
}
}
$$
$$
\displaystyle{
=
\left( \frac{576}{950} - \frac{268}{1359} \right)
- 1.96
\sqrt{
\frac{\frac{576}{950} \left( 1 - \frac{576}{950} \right)}{950}
+
\frac{\frac{268}{1359} \left( 1 - \frac{268}{1359} \right)}{1359}
}
}
$$
$$
\displaystyle{
=
0.372
}
$$
and the upper bound is:
$$
\displaystyle{
\left( \hat p_1 -\hat p_2 \right)
+ z^*
\sqrt{
\frac{\hat p_1 \left( 1 - \hat p_1 \right)}{n_1}
+
\frac{\hat p_2 \left( 1 - \hat p_2 \right)}{n_2}
}
}
$$
$$
\displaystyle{
=
\left( \frac{576}{950} - \frac{268}{1359} \right)
+ 1.96
\sqrt{
\frac{\frac{576}{950} \left( 1 - \frac{576}{950} \right)}{950}
+
\frac{\frac{268}{1359} \left( 1 - \frac{268}{1359} \right)}{1359}
}
}
$$
$$
\displaystyle{
=
0.447
}
$$
<div class="note">
Make sure you can compute these confidence intervals before reading on.
</div>
So, the 95% confidence interval for the difference in the proportions is:
$$
(0.372, 0.447)
$$
<div class="tip">
If we switch the way we label group 1 and group 2, then our confidence interval would have the opposite signs: $(-0.447, -0.372)$.
</div>
To interpret this confidence interval, we say, "We are 95% confident that the true *difference* in the proportions of times fox prints will appear in the sand-pads is between 0.372 and 0.447."
Notice that zero is not in this confidence interval, so zero is not a plausible value for $p_1 - p_2$. Based on this result, it is reasonable to conclude that the proportion of times foxes are observed in the sand-pads is not the same before and after the effort to reduce their population. It seems that the work to reduce the number of foxes is having an effect on their presence in the reserve.
As you may have guessed, the [Math 221 Statistics Toolbox](./Data/Math221StatisticsToolbox.xltx) can be used to calculate confidence intervals for the difference of two proportions.
### Checking Requirements for a Confidence Interval
The requirements for computing a confidence interval for two proportions are the same as the requirements for doing a hypothesis test.
$$
\begin{array}{rrr}
n_1 \cdot \hat p_1 \ge 10 && n_2 \cdot \hat p_2 \ge 10 \\
n_1 \cdot \left(1-\hat p_1\right) \ge 10 && n_2 \cdot \left(1-\hat p_2\right) \ge 10
\end{array}
$$
In this example, all of the requirements are satisfied:
$$
\begin{array}{rr}
n_1 \cdot \hat p_1 = 950 \cdot 0.606 = 576 \ge 10 & n_2 \cdot \hat p_2 = 1359 \cdot 0.197 = 268 \ge 10 \\
n_1 \cdot \left(1-\hat p_1\right) = 950 \cdot (1-0.606) = 374 \ge 10 & n_2 \cdot \left(1-\hat p_2\right) = 1359 \cdot (1-0.197) = 1091 \ge 10
\end{array}
$$
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
22. What is the value of $z^*$ for a 93% confidence interval?
<a href="javascript:showhide('Q22')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q22" style="display:none;">
<center>$z^* = 1.8119$</center>
</div>
<br>
23. Find the 93% confidence interval for the difference in the proprotions of the times fox prints are observed in the sand-pads before and after the effort to reduce the fox population.
<a href="javascript:showhide('Q23')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q23" style="display:none;">
<center>$(0.374, 0.444)$</center>
</div>
<br>
<br>
## Summary
<div class="SummaryHeading">Remember...</div>
<div class="Summary">
- When conducting hypothesis tests using two proportions, the null hypothesis is always $p_1=p_2$, indicating that there is no difference between the two proportions. The alternative hypothesis can be left-tailed ($<$), right-tailed($>$), or two-tailed($\ne$).
- For a hypothesis test and confidence interval of two proportions, we use the following symbols:
$$
\begin{array}{lcl}
\text{Sample proportion for group 1:} & \hat p_1 = \displaystyle{\frac{x_1}{n_1}} \\
\text{Sample proportion for group 2:} & \hat p_2 = \displaystyle{\frac{x_2}{n_2}}
\end{array}
$$
- For a hypothesis test only, we use the following symbols:
$$
\begin{array}{lcl}
\text{Overall sample proportion:} & \hat p = \displaystyle{\frac{x_1+x_2}{n_1+n_2}}
\end{array}
$$
- Whenever zero is contained in the confidence interval of the difference of the true proportions we conclude that there is no significant difference between the two proportions.
- You will use the Excel spreadsheet [Math 221 Statistics Toolbox](./Data/Math221StatisticsToolbox.xltx) to perform hypothesis testing and calculate confidence intervals for problems involving two proportions.
<br>
</div>
<br>
## Navigation
<center>
| **Previous Reading** | **This Reading** | **Next Reading** |
| :------------------: | :--------------: | :--------------: |
| [Lesson 17: <br> Inference for One Proportion](Lesson17.html) | Lesson 18: <br> Inference for Two Proportion | [Lesson 19: <br> Inference for Independence of Categorical Data](Lesson19.html) |
</center>