forked from saundersg/BYUI_M221_Book_R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lesson07.Rmd
303 lines (193 loc) · 15.2 KB
/
Lesson07.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
---
title: "Lesson 7: Calculating Probabilities involving the Sample Mean"
output:
html_document:
theme: cerulean
toc: true
toc_float: false
---
<script type = "text/javascript">
function showhide(id) {
var e = document.getElementById(id);
e.style.display = (e.style.display == 'block') ? 'none' : 'block';
}
</script>
<!--
<div style = "float:right;width = 40%;">
<br/>
<div style = "padding-left:10%;">**Optional Lesson Video**</div>
<iframe width = "90%" align = "right" src = "https://www.youtube.com/embed/videoseries?list = PLaZryQtbPQC9Vm2oOb7MbspQvlvrkQba1" frameborder = "1" allow = "autoplay; encrypted-media" allowfullscreen></iframe>
</div>
<br>
-->
## Lesson Outcomes
<a href = "javascript:showhide('oc')"><span style = "font-size:8pt;">Show/Hide Outcomes</span></a>
<div id = "oc" style = "display:none;">
By the end of this lesson, you should be able to:
* Calculate probabilities for a sample mean using the normal distribution
</div>
<br>
## Review of Sampling Distributions
When is the sample mean normally distributed (or approximately normally distributed)? This happens when either of the two conditions are satisfied:
- The parent population is normally distributed, so the sample mean is automatically normally distributed.
- The sample size is large, and the Central Limit Theorem implies that the sample mean is normally distributed.
For the mean of draws from a random variable with mean $\mu$ and standard deviation $\sigma$, the following are true:
- The mean of the random variable $\bar X$ is $\mu$.
- The standard deviation of the random variable $\bar X$ is $\displaystyle{\frac{\sigma}{\sqrt{n}}}$.
Previously, you learned how to use the normal probability applet to convert a $z$-score to a corresponding area under the curve. These skills will be applied again in this activity. The following questions will help you brush up on how to use the applet.
<center>
<img src = "./Images/L04_A06_Graph.jpg">
</center>
<br>
<div class = "QuestionsHeading">Answer the following questions:</div>
<div class = "Questions">
1. Look at the output from the applet, given above. What is the probability that a standard normal random variable will be below 1.25?
<a href = "javascript:showhide('Q1')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q1" style = "display:none;">
<center>$0.8944$</center>
</div>
<br>
2. Look at the output from the applet, given above. What is the probability that a standard normal random variable will be above 1.25?
<a href = "javascript:showhide('Q2')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q2" style = "display:none;">
<center>$1 - 0.8944 = 0.1056$</center>
</div>
<br>
3. Challenge problem: Use the output from the applet above to determine the probability that a standard normal random variable will be between -1.25 and 1.25.
<a href = "javascript:showhide('Q3')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q3" style = "display:none;">
* Since the probability in the right tail above 1.25 is 0.106, the probability in the left tail is also 0.106. So, the probability in the middle, between -1.25 and 1.25 must be $1 – 2(0.106) = 1 – 0.212 = 0.788$.
</div>
<br>
4. Use the normal probability applet to find the probability that a standard normal random variable will be greater than -0.75.
<a href = "javascript:showhide('Q4')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q4" style = "display:none;">
* Using the normal probability applet, we find that the area to the right of $z$ = -0.75 is $0.773$.
</div>
<br>
5. If the mean of a normally distributed random variable is -23 and the standard deviation is 7, what is the probability that the random variable will have a value that is less than -25?
<a href = "javascript:showhide('Q5')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q5" style = "display:none;">
* The $z$-score is:
<br>
<center>
$\displaystyle{z = \frac{-25 - (-23)}{7} = -0.2857}$
</center>
<br>
* Using the normal probability applet, we find that the area to the left of $z$ = -0.2857 is $0.3874$.
</div>
</div>
<br>
## Probabilities Involving a Mean
If the sample mean is normally distributed, then we can consider the sample mean, $\bar x$, as one observation from a normal distribution with mean $\mu$ and standard deviation $\displaystyle{\frac{\sigma}{\sqrt{n}}}$. With this in mind, we can compute the $z$-score for any value of this random variable:
$$z = \frac{\text{value} - \text{mean}}{\text{standard deviation}} = \frac{\bar x-\mu}{\sigma / \sqrt{n} }$$
Notice that we just replaced the "value" with the normal random variable, the "mean" with its mean and "standard deviation" with its standard deviation.
The $z$-score follows a standard normal distribution. That is, it has a mean of 0 and a standard deviation of 1. After we compute the $z$-score, we can use the applet to find the probability that a randomly selected mean will be above or below a given value of $\bar x$.
**Worked Example: Finding the Area under a Normal Curve (Based on a Sample Mean)**
After finding the $z$-score, we can use the normal probability applet to find the area under the curve (i.e. the probability.) Suppose that a sample of size $n = 4$ has been drawn from a normal population with mean $\mu = 7$ and standard deviation $\sigma = 3$. Since the parent population is normal, we know the sampling distribution of the sample mean $\bar x$ will be normal with mean $\mu = 7$ and standard deviation $\displaystyle{ \frac{\sigma}{\sqrt{n}} = \frac{3}{\sqrt{4}} = 1.5}$. We can use this information to find the probability that $\bar x$ will be greater than 10.
The $z$-score is
$$z = \frac{\bar x-\mu}{\sigma / \sqrt{n} } = \frac{10 - 7}{3 / \sqrt{4} } = 2.0$$
The probability that $z$ will be greater than $2.00$ is the area under the standard normal distribution to the right of $2.00$. We find this area using the normal probability applet.
<img src = "./Images/L04_A06_Graph_2.jpg">
The area to the right of $z = 2.00$ is $0.02275$. This is the probability that the random sample of $n = 4$ items will have a mean that is greater than $10$.
In this example, the sample mean was automatically normally distributed because the parent population was normally distributed. This will always be true, no matter what size sample is drawn.
**Worked Example: Finding the Area under a Normal Curve (Based on a Sample Mean)**
What do we do if the parent population is not normally distributed? If the sample size is large, then the Central Limit Theorem guarantees that the sample mean will be approximately normally distributed. Based on this, we can still do normal probability calculations for the mean of a random sample.
The distribution of the weekly costs incurred by Global Solutions Unlimited is right skewed. The population mean of the costs is \$26,400 and the standard deviation is \$23,200. A random sample of $n = 40$ weeks is to be selected, what is the probability that the mean weekly costs will be less than \$20,000?
Since the number of observations is large, the Central Limit Theorem assures that the sample mean will follow a normal distribution. The $z$-score for $\bar x = $ \$20,000 is:
$$z = \frac{\bar x-\mu}{\sigma / \sqrt{n} } = \frac{20,000-26,400}{23,200 / \sqrt{40}} = -1.745 $$
Using the normal probability applet, we can find the area to the left of this $z$-score:
<img src = "./Images/L04_A06_Graph_3.jpg">
The area to the left of $z = -1.745$ is $0.04049$. This is the probability that the mean costs for the $n = 40$ weeks will be less than \$20,000.
<br>
#### Example: Environmental Clean Up
<!-- This picture comes from http://happeningstock.deviantart.com/art/Grass-Hill-Area-Closed-Sign-255592183 and has an open license -->
<img src = "./Images/L07_AREA_CLOSED_sign.jpg">
We will now consider a complete example that shows how these probabilities are used in practice.
The United States Government decided to open some land near a uranium enrichment facility to public use. After a few years of the public hiking, biking, and sometimes even hunting on this land, workers from the facility noticed that there were several unnatural-looking mounds in the earth near the area. Because this land was once used by the facility and nobody knew the origin of these piles, the government closed public access to the land until they could assess if the mounds were safe.
<br>
<img src = "./Images/Step1.png">
**Step 1: Design the study.**
Measurements were taken from the mounds to assess one of the contaminants, lead. The tests involved are very expensive. Each sample costs about \$600 to process.
The Environmental Protection Agency (EPA) has set a "No Action Level" (NAL) for lead. If the mean concentration in the soil of the contaminant is less than the NAL, then the area can be declared safe for public use. If the concentration of the contaminant reaches or exceeds the NAL, the site must be cleaned additionally before it is declared safe. The NAL for lead is 50 milligrams of lead per kilogram of soil (mg/kg).
The hypothesis for this test are:
$H_0:~~\mu = 50 \frac{\text{mg}}{\text{kg}}$
$H_a:~~\mu < 50 \frac{\text{mg}}{\text{kg}}$
In environmental testing, we always assume the site is dirty. That is, our null hypothesis is that the mean level of contamination is at the NAL. We gather data to determine if there is sufficient evidence to support rejecting the null hypothesis.
<br>
<img src = "./Images/Step2.png">
**Step 2: Collect Data**
Scientists collected $n = 61$ measurements of the lead concentration in the soil, measured in mg/kg. Measurements of the lead contamination in the soil samples from the uranium plant are given in the file [uranium_plant_lead](Data/uranium_plant_lead.xlsx).
<br>
<img src = "./Images/Step3.png">
**Step 3: Describe the Data**
<div class = "QuestionsHeading">Answer the following question:</div>
<div class = "Questions">
6. Create a histogram of the lead contamination data.
<a href = "javascript:showhide('Q6')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q6" style = "display:none;">
<img src = "./Images/LeadContamination-Histogram.png">
</div>
<br>
7. What is the shape of the distribution of the data?
<a href = "javascript:showhide('Q7')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q7" style = "display:none;">
* The lead concentration data are right skewed.
</div>
<br>
8. Are any of the measured values above the NAL (50 mg/kg)? What might this suggest?
<a href = "javascript:showhide('Q8')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q8" style = "display:none;">
* Yes, there are several observations that are above the NAL. This may suggest that the lead concentrations are very high in some isolated soil samples. It may also be an indication of variability in the test results.
</div>
<br>
9. Visually, does it look like the mean lead concentration is less than 50 mg/kg?
<a href = "javascript:showhide('Q9')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q9" style = "display:none;">
* Answers may vary.
</div>
<br>
10. What is the value of the sample mean?
<a href = "javascript:showhide('Q10')"><span style = "font-size:8pt;">Show/Hide Solution</span></a>
<div id = "Q10" style = "display:none;">
* The sample mean is 29.13 mg/kg.
</div>
</div>
<br>
<img src = "./Images/Step4.png">
**Step 4: Make Inferences**
We assume the null hypothesis is true and we gather evidence against this requirement. We will find the probability that the mean lead concentration is less than $\bar x$ = 29.13 mg/kg, assuming that the true mean lead concentration is $\mu$ = 50 mg/kg. This probability is called the $P$-value.
Since the sample size is large ($n = 61$), we can conclude that the sample mean is normally distributed. So, we can use the normal probability applet to find the probability that the sample mean will be less than 29.13. Historical analyses show that the population standard deviation is $\sigma$ = 24 mg/kg.
First, we compute the $z$-score.
$$z = \frac{\bar x-\mu}{\sigma / \sqrt{n} } = \frac{29.13 - 50}{24 / \sqrt{61} } = -6.79$$
Using the applet, we find the area to the left of $z = -6.79$. This is the $P$-value. It is not clear from the image of the applet, but the area to the left was shaded before the $z$-score was entered.
<img src = "./Images/L04_A06_Graph_4.jpg">
The $P$-value for this test is $5.6 \times 10^{-12} = 0.000~000~000~005~6$. This is a very small probability. Assuming the null hypothesis is true-that is, the site is unacceptably contaminated-it is very unlikely that we would find such a low mean contamination level among the $n = 61$ randomly selected soil samples.
Since the $P$-value is low, we reject the null hypothesis.
<br>
<img src = "./Images/Step5.png">
**Step 5: Take Action**
There is sufficient evidence to suggest that the mean lead level is less than 50 mg/kg. We conclude that the lead concentration in the soil is low enough that it is not a danger to the public. Based on the results of this and other similar test results, the government has reopened public access to this area.
<br>
## Review of Key Concepts
1. To compute probabilities involving the sample mean $\bar{x}$ we must first determine if $\bar{x}$ is normally distributed. If it is not, we cannot compute probabilities. There are two cases when $\bar{x}$ is guaranteed to be normally distributed. They are:
* The parent population is Normally distributed.
* The Central Limit Theorem guarantees the distribution of $\bar{x}$ is Normal if the sample size $n$ is large enough. For this course, we require $n \geq 30$.
2. The collection of all possible sample means $\bar{x}$ is also a population. The mean of the distribution of sample means is the population mean $\mu$. The standard deviation of the distribution of sample means is $\frac{\sigma}{\sqrt{n}}$, where $\sigma$ is the parent population standard deviation and $n$ is the sample size.
3. Once we have determined that the sample mean is Normally distributed, we can compute probabilities with $\bar{x}$. We first compute $z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}$ and then use the Normal Probability Applet to compute probabilities about $z$.
<br>
## Summary
<div class = "SummaryHeading">Remember...</div>
<div class = "Summary">
- When the distribution of sample means is normally distributed, we can use **z-scores** and the **probability applet** to calculate proportions and probabilities. A z-score is calculated as: $\displaystyle{z = \frac{\text{value}-\text{mean}}{\text{standard deviation}} = \frac{\bar x-\mu}{\sigma/\sqrt{n}}}$
- The **$P$-value** is the probability of getting a test statistic at least as extreme as the one you got, assuming $H_0$ is true. A $P$-value is calculated by finding the area under the normal distribution curve that is more extreme (farther away from the mean) than the z-score.
</div>
<br>
## Navigation
<center>
| **Previous Reading** | **This Reading** | **Next Reading** |
| :------------------: | :--------------: | :--------------: |
| [Lesson 6: <br> Sampling Distribution of the Sample Mean & The Central Limit Theorem](Lesson06.html) | Lesson 07: <br> Probability Calculations involving a Mean Response | [Lesson 8: <br> Review for Exam 1](Lesson08.html) |
</center>