-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path5-regression.html
2651 lines (2603 loc) · 196 KB
/
5-regression.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="" xml:lang="">
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>Chapter 5 Basic Regression | Modern Biological Data Analysis</title>
<meta name="description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
<meta name="generator" content="bookdown 0.22.3 and GitBook 2.6.7" />
<meta property="og:title" content="Chapter 5 Basic Regression | Modern Biological Data Analysis" />
<meta property="og:type" content="book" />
<meta property="og:url" content="https://moderndive.com/" />
<meta property="og:image" content="https://moderndive.com//images/logos/book_cover.png" />
<meta property="og:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
<meta name="github-repo" content="moderndive/ModernDive_book" />
<meta name="twitter:card" content="summary" />
<meta name="twitter:title" content="Chapter 5 Basic Regression | Modern Biological Data Analysis" />
<meta name="twitter:site" content="@ModernDive" />
<meta name="twitter:description" content="An open-source and fully-reproducible electronic textbook for teaching statistical inference using tidyverse data science tools." />
<meta name="twitter:image" content="https://moderndive.com//images/logos/book_cover.png" />
<meta name="author" content="Chester Ismay and Albert Y. Kim Foreword by Kelly S. McConville Adapted by William R. Morgan" />
<meta name="date" content="2021-08-24" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta name="apple-mobile-web-app-capable" content="yes" />
<meta name="apple-mobile-web-app-status-bar-style" content="black" />
<link rel="apple-touch-icon-precomposed" sizes="152x152" href="images/logos/favicons/apple-touch-icon.png" />
<link rel="shortcut icon" href="images/logos/favicons/favicon.ico" type="image/x-icon" />
<link rel="prev" href="4-tidy.html"/>
<link rel="next" href="6-multiple-regression.html"/>
<script src="libs/header-attrs-2.9/header-attrs.js"></script>
<script src="libs/jquery-2.2.3/jquery.min.js"></script>
<link href="libs/gitbook-2.6.7/css/style.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-table.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-bookdown.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-highlight.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-search.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-fontsettings.css" rel="stylesheet" />
<link href="libs/gitbook-2.6.7/css/plugin-clipboard.css" rel="stylesheet" />
<link href="libs/anchor-sections-1.0.1/anchor-sections.css" rel="stylesheet" />
<script src="libs/anchor-sections-1.0.1/anchor-sections.js"></script>
<script src="libs/kePrint-0.0.1/kePrint.js"></script>
<link href="libs/lightable-0.0.1/lightable.css" rel="stylesheet" />
<script src="libs/htmlwidgets-1.5.3/htmlwidgets.js"></script>
<link href="libs/dygraphs-1.1.1/dygraph.css" rel="stylesheet" />
<script src="libs/dygraphs-1.1.1/dygraph-combined.js"></script>
<script src="libs/dygraphs-1.1.1/shapes.js"></script>
<script src="libs/moment-2.8.4/moment.js"></script>
<script src="libs/moment-timezone-0.2.5/moment-timezone-with-data.js"></script>
<script src="libs/moment-fquarter-1.0.0/moment-fquarter.min.js"></script>
<script src="libs/dygraphs-binding-1.1.1.6/dygraphs.js"></script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-89938436-1', 'auto');
ga('send', 'pageview');
</script>
<style type="text/css">
pre > code.sourceCode { white-space: pre; position: relative; }
pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
pre > code.sourceCode > span:empty { height: 1.2em; }
.sourceCode { overflow: visible; }
code.sourceCode > span { color: inherit; text-decoration: inherit; }
pre.sourceCode { margin: 0; }
@media screen {
div.sourceCode { overflow: auto; }
}
@media print {
pre > code.sourceCode { white-space: pre-wrap; }
pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
}
pre.numberSource code
{ counter-reset: source-line 0; }
pre.numberSource code > span
{ position: relative; left: -4em; counter-increment: source-line; }
pre.numberSource code > span > a:first-child::before
{ content: counter(source-line);
position: relative; left: -1em; text-align: right; vertical-align: baseline;
border: none; display: inline-block;
-webkit-touch-callout: none; -webkit-user-select: none;
-khtml-user-select: none; -moz-user-select: none;
-ms-user-select: none; user-select: none;
padding: 0 4px; width: 4em;
color: #aaaaaa;
}
pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa; padding-left: 4px; }
div.sourceCode
{ }
@media screen {
pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
}
code span.al { color: #ff0000; font-weight: bold; } /* Alert */
code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
code span.at { color: #7d9029; } /* Attribute */
code span.bn { color: #40a070; } /* BaseN */
code span.bu { } /* BuiltIn */
code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
code span.ch { color: #4070a0; } /* Char */
code span.cn { color: #880000; } /* Constant */
code span.co { color: #60a0b0; font-style: italic; } /* Comment */
code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
code span.do { color: #ba2121; font-style: italic; } /* Documentation */
code span.dt { color: #902000; } /* DataType */
code span.dv { color: #40a070; } /* DecVal */
code span.er { color: #ff0000; font-weight: bold; } /* Error */
code span.ex { } /* Extension */
code span.fl { color: #40a070; } /* Float */
code span.fu { color: #06287e; } /* Function */
code span.im { } /* Import */
code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
code span.kw { color: #007020; font-weight: bold; } /* Keyword */
code span.op { color: #666666; } /* Operator */
code span.ot { color: #007020; } /* Other */
code span.pp { color: #bc7a00; } /* Preprocessor */
code span.sc { color: #4070a0; } /* SpecialChar */
code span.ss { color: #bb6688; } /* SpecialString */
code span.st { color: #4070a0; } /* String */
code span.va { color: #19177c; } /* Variable */
code span.vs { color: #4070a0; } /* VerbatimString */
code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
</style>
<style type="text/css">
/* Used with Pandoc 2.11+ new --citeproc when CSL is used */
div.csl-bib-body { }
div.csl-entry {
clear: both;
}
.hanging div.csl-entry {
margin-left:2em;
text-indent:-2em;
}
div.csl-left-margin {
min-width:2em;
float:left;
}
div.csl-right-inline {
margin-left:2em;
padding-left:1em;
}
div.csl-indent {
margin-left: 2em;
}
</style>
<link rel="stylesheet" href="style.css" type="text/css" />
</head>
<body>
<div class="book without-animation with-summary font-size-2 font-family-1" data-basepath=".">
<div class="book-summary">
<nav role="navigation">
<ul class="summary">
<li class="chapter" data-level="" data-path="index.html"><a href="index.html"><i class="fa fa-check"></i>Welcome to ModernDive</a></li>
<li class="chapter" data-level="" data-path="foreword.html"><a href="foreword.html"><i class="fa fa-check"></i>Foreword</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html"><i class="fa fa-check"></i>Preface</a>
<ul>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#introduction-for-students"><i class="fa fa-check"></i>Introduction for students</a>
<ul>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#what-we-hope-you-will-learn-from-this-book"><i class="fa fa-check"></i>What we hope you will learn from this book</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#datascience-pipeline"><i class="fa fa-check"></i>Data/science pipeline</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#reproducible-research"><i class="fa fa-check"></i>Reproducible research</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#final-note-for-students"><i class="fa fa-check"></i>Final note for students</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#introduction-for-instructors"><i class="fa fa-check"></i>Introduction for instructors</a>
<ul>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#resources"><i class="fa fa-check"></i>Resources</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#why-did-we-write-this-book"><i class="fa fa-check"></i>Why did we write this book?</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#who-is-this-book-for"><i class="fa fa-check"></i>Who is this book for?</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#connect-and-contribute"><i class="fa fa-check"></i>Connect and contribute</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#acknowledgements"><i class="fa fa-check"></i>Acknowledgements</a></li>
<li class="chapter" data-level="" data-path="preface.html"><a href="preface.html#about-this-book"><i class="fa fa-check"></i>About this book</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the authors</a></li>
<li class="chapter" data-level="1" data-path="1-getting-started.html"><a href="1-getting-started.html"><i class="fa fa-check"></i><b>1</b> Getting Started with Data in R</a>
<ul>
<li class="chapter" data-level="1.1" data-path="1-getting-started.html"><a href="1-getting-started.html#r-rstudio"><i class="fa fa-check"></i><b>1.1</b> What are R and RStudio?</a>
<ul>
<li class="chapter" data-level="1.1.1" data-path="1-getting-started.html"><a href="1-getting-started.html#installing"><i class="fa fa-check"></i><b>1.1.1</b> Installing R and RStudio</a></li>
<li class="chapter" data-level="1.1.2" data-path="1-getting-started.html"><a href="1-getting-started.html#using-r-via-rstudio"><i class="fa fa-check"></i><b>1.1.2</b> Using R via RStudio</a></li>
</ul></li>
<li class="chapter" data-level="1.2" data-path="1-getting-started.html"><a href="1-getting-started.html#code"><i class="fa fa-check"></i><b>1.2</b> How do I code in R?</a>
<ul>
<li class="chapter" data-level="1.2.1" data-path="1-getting-started.html"><a href="1-getting-started.html#programming-concepts"><i class="fa fa-check"></i><b>1.2.1</b> Basic programming concepts and terminology</a></li>
<li class="chapter" data-level="1.2.2" data-path="1-getting-started.html"><a href="1-getting-started.html#messages"><i class="fa fa-check"></i><b>1.2.2</b> Errors, warnings, and messages</a></li>
<li class="chapter" data-level="1.2.3" data-path="1-getting-started.html"><a href="1-getting-started.html#tips-code"><i class="fa fa-check"></i><b>1.2.3</b> Tips on learning to code</a></li>
</ul></li>
<li class="chapter" data-level="1.3" data-path="1-getting-started.html"><a href="1-getting-started.html#packages"><i class="fa fa-check"></i><b>1.3</b> What are R packages?</a>
<ul>
<li class="chapter" data-level="1.3.1" data-path="1-getting-started.html"><a href="1-getting-started.html#package-installation"><i class="fa fa-check"></i><b>1.3.1</b> Package installation</a></li>
<li class="chapter" data-level="1.3.2" data-path="1-getting-started.html"><a href="1-getting-started.html#package-loading"><i class="fa fa-check"></i><b>1.3.2</b> Package loading</a></li>
<li class="chapter" data-level="1.3.3" data-path="1-getting-started.html"><a href="1-getting-started.html#package-use"><i class="fa fa-check"></i><b>1.3.3</b> Package use</a></li>
</ul></li>
<li class="chapter" data-level="1.4" data-path="1-getting-started.html"><a href="1-getting-started.html#rfishbase"><i class="fa fa-check"></i><b>1.4</b> Explore your first datasets</a>
<ul>
<li class="chapter" data-level="1.4.1" data-path="1-getting-started.html"><a href="1-getting-started.html#rfishpackage"><i class="fa fa-check"></i><b>1.4.1</b> <code>rfishbase</code> package</a></li>
<li class="chapter" data-level="1.4.2" data-path="1-getting-started.html"><a href="1-getting-started.html#fishbasedataframe"><i class="fa fa-check"></i><b>1.4.2</b> <code>fishbase</code> data frame</a></li>
<li class="chapter" data-level="1.4.3" data-path="1-getting-started.html"><a href="1-getting-started.html#exploredataframes"><i class="fa fa-check"></i><b>1.4.3</b> Exploring data frames</a></li>
<li class="chapter" data-level="1.4.4" data-path="1-getting-started.html"><a href="1-getting-started.html#identification-vs-measurement-variables"><i class="fa fa-check"></i><b>1.4.4</b> Identification and measurement variables</a></li>
<li class="chapter" data-level="1.4.5" data-path="1-getting-started.html"><a href="1-getting-started.html#help-files"><i class="fa fa-check"></i><b>1.4.5</b> Help files</a></li>
</ul></li>
<li class="chapter" data-level="1.5" data-path="1-getting-started.html"><a href="1-getting-started.html#conclusion"><i class="fa fa-check"></i><b>1.5</b> Conclusion</a>
<ul>
<li class="chapter" data-level="1.5.1" data-path="1-getting-started.html"><a href="1-getting-started.html#additional-resources"><i class="fa fa-check"></i><b>1.5.1</b> Additional resources</a></li>
<li class="chapter" data-level="1.5.2" data-path="1-getting-started.html"><a href="1-getting-started.html#whats-to-come"><i class="fa fa-check"></i><b>1.5.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>I Data Science with tidyverse</b></span></li>
<li class="chapter" data-level="2" data-path="2-viz.html"><a href="2-viz.html"><i class="fa fa-check"></i><b>2</b> Data Visualization</a>
<ul>
<li class="chapter" data-level="" data-path="2-viz.html"><a href="2-viz.html#needed-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="2.1" data-path="2-viz.html"><a href="2-viz.html#grammarofgraphics"><i class="fa fa-check"></i><b>2.1</b> The grammar of graphics</a>
<ul>
<li class="chapter" data-level="2.1.1" data-path="2-viz.html"><a href="2-viz.html#components-of-the-grammar"><i class="fa fa-check"></i><b>2.1.1</b> Components of the grammar</a></li>
<li class="chapter" data-level="2.1.2" data-path="2-viz.html"><a href="2-viz.html#gapminder"><i class="fa fa-check"></i><b>2.1.2</b> Gapminder data</a></li>
<li class="chapter" data-level="2.1.3" data-path="2-viz.html"><a href="2-viz.html#other-components"><i class="fa fa-check"></i><b>2.1.3</b> Other components</a></li>
<li class="chapter" data-level="2.1.4" data-path="2-viz.html"><a href="2-viz.html#ggplot2-package"><i class="fa fa-check"></i><b>2.1.4</b> ggplot2 package</a></li>
</ul></li>
<li class="chapter" data-level="2.2" data-path="2-viz.html"><a href="2-viz.html#FiveNG"><i class="fa fa-check"></i><b>2.2</b> Five named graphs - the 5NG</a></li>
<li class="chapter" data-level="2.3" data-path="2-viz.html"><a href="2-viz.html#scatterplots"><i class="fa fa-check"></i><b>2.3</b> 5NG#1: Scatterplots</a>
<ul>
<li class="chapter" data-level="2.3.1" data-path="2-viz.html"><a href="2-viz.html#geompoint"><i class="fa fa-check"></i><b>2.3.1</b> Scatterplots via <code>geom_point</code></a></li>
<li class="chapter" data-level="2.3.2" data-path="2-viz.html"><a href="2-viz.html#overplotting"><i class="fa fa-check"></i><b>2.3.2</b> Overplotting</a></li>
<li class="chapter" data-level="2.3.3" data-path="2-viz.html"><a href="2-viz.html#summary"><i class="fa fa-check"></i><b>2.3.3</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.4" data-path="2-viz.html"><a href="2-viz.html#linegraphs"><i class="fa fa-check"></i><b>2.4</b> 5NG#2: Linegraphs</a>
<ul>
<li class="chapter" data-level="2.4.1" data-path="2-viz.html"><a href="2-viz.html#geomline"><i class="fa fa-check"></i><b>2.4.1</b> Linegraphs via <code>geom_line</code></a></li>
<li class="chapter" data-level="2.4.2" data-path="2-viz.html"><a href="2-viz.html#summary-1"><i class="fa fa-check"></i><b>2.4.2</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.5" data-path="2-viz.html"><a href="2-viz.html#facets"><i class="fa fa-check"></i><b>2.5</b> Facets</a></li>
<li class="chapter" data-level="2.6" data-path="2-viz.html"><a href="2-viz.html#histograms"><i class="fa fa-check"></i><b>2.6</b> 5NG#3: Histograms</a>
<ul>
<li class="chapter" data-level="2.6.1" data-path="2-viz.html"><a href="2-viz.html#geomhistogram"><i class="fa fa-check"></i><b>2.6.1</b> Histograms via <code>geom_histogram</code></a></li>
<li class="chapter" data-level="2.6.2" data-path="2-viz.html"><a href="2-viz.html#adjustbins"><i class="fa fa-check"></i><b>2.6.2</b> Adjusting the bins</a></li>
<li class="chapter" data-level="2.6.3" data-path="2-viz.html"><a href="2-viz.html#summary-2"><i class="fa fa-check"></i><b>2.6.3</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.7" data-path="2-viz.html"><a href="2-viz.html#boxplots"><i class="fa fa-check"></i><b>2.7</b> 5NG#4: Boxplots</a>
<ul>
<li class="chapter" data-level="2.7.1" data-path="2-viz.html"><a href="2-viz.html#geomboxplot"><i class="fa fa-check"></i><b>2.7.1</b> Boxplots via <code>geom_boxplot</code></a></li>
<li class="chapter" data-level="2.7.2" data-path="2-viz.html"><a href="2-viz.html#summary-3"><i class="fa fa-check"></i><b>2.7.2</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.8" data-path="2-viz.html"><a href="2-viz.html#geombar"><i class="fa fa-check"></i><b>2.8</b> 5NG#5: Barplots</a>
<ul>
<li class="chapter" data-level="2.8.1" data-path="2-viz.html"><a href="2-viz.html#barplots-via-geom_bar-or-geom_col"><i class="fa fa-check"></i><b>2.8.1</b> Barplots via <code>geom_bar</code> or <code>geom_col</code></a></li>
<li class="chapter" data-level="2.8.2" data-path="2-viz.html"><a href="2-viz.html#must-avoid-pie-charts"><i class="fa fa-check"></i><b>2.8.2</b> Must avoid pie charts!</a></li>
<li class="chapter" data-level="2.8.3" data-path="2-viz.html"><a href="2-viz.html#two-categ-barplot"><i class="fa fa-check"></i><b>2.8.3</b> Two categorical variables</a></li>
<li class="chapter" data-level="2.8.4" data-path="2-viz.html"><a href="2-viz.html#summary-4"><i class="fa fa-check"></i><b>2.8.4</b> Summary</a></li>
</ul></li>
<li class="chapter" data-level="2.9" data-path="2-viz.html"><a href="2-viz.html#data-vis-conclusion"><i class="fa fa-check"></i><b>2.9</b> Conclusion</a>
<ul>
<li class="chapter" data-level="2.9.1" data-path="2-viz.html"><a href="2-viz.html#summary-table"><i class="fa fa-check"></i><b>2.9.1</b> Summary table</a></li>
<li class="chapter" data-level="2.9.2" data-path="2-viz.html"><a href="2-viz.html#function-argument-specification"><i class="fa fa-check"></i><b>2.9.2</b> Function argument specification</a></li>
<li class="chapter" data-level="2.9.3" data-path="2-viz.html"><a href="2-viz.html#additional-resources-1"><i class="fa fa-check"></i><b>2.9.3</b> Additional resources</a></li>
<li class="chapter" data-level="2.9.4" data-path="2-viz.html"><a href="2-viz.html#whats-to-come-3"><i class="fa fa-check"></i><b>2.9.4</b> What’s to come</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="3" data-path="3-wrangling.html"><a href="3-wrangling.html"><i class="fa fa-check"></i><b>3</b> Data Wrangling</a>
<ul>
<li class="chapter" data-level="" data-path="3-wrangling.html"><a href="3-wrangling.html#wrangling-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="3.1" data-path="3-wrangling.html"><a href="3-wrangling.html#piping"><i class="fa fa-check"></i><b>3.1</b> The pipe operator: <code>%>%</code></a></li>
<li class="chapter" data-level="3.2" data-path="3-wrangling.html"><a href="3-wrangling.html#filter"><i class="fa fa-check"></i><b>3.2</b> <code>filter</code> rows</a></li>
<li class="chapter" data-level="3.3" data-path="3-wrangling.html"><a href="3-wrangling.html#slice-rows"><i class="fa fa-check"></i><b>3.3</b> <code>slice</code> rows</a></li>
<li class="chapter" data-level="3.4" data-path="3-wrangling.html"><a href="3-wrangling.html#select"><i class="fa fa-check"></i><b>3.4</b> <code>select</code> variables</a>
<ul>
<li class="chapter" data-level="3.4.1" data-path="3-wrangling.html"><a href="3-wrangling.html#rename"><i class="fa fa-check"></i><b>3.4.1</b> <code>rename</code> variables</a></li>
</ul></li>
<li class="chapter" data-level="3.5" data-path="3-wrangling.html"><a href="3-wrangling.html#summarize"><i class="fa fa-check"></i><b>3.5</b> <code>summarize</code> variables</a></li>
<li class="chapter" data-level="3.6" data-path="3-wrangling.html"><a href="3-wrangling.html#groupby"><i class="fa fa-check"></i><b>3.6</b> <code>group_by</code> rows</a>
<ul>
<li class="chapter" data-level="3.6.1" data-path="3-wrangling.html"><a href="3-wrangling.html#grouping-by-more-than-one-variable"><i class="fa fa-check"></i><b>3.6.1</b> Grouping by more than one variable</a></li>
</ul></li>
<li class="chapter" data-level="3.7" data-path="3-wrangling.html"><a href="3-wrangling.html#mutate"><i class="fa fa-check"></i><b>3.7</b> <code>mutate</code> existing variables</a></li>
<li class="chapter" data-level="3.8" data-path="3-wrangling.html"><a href="3-wrangling.html#arrange"><i class="fa fa-check"></i><b>3.8</b> <code>arrange</code> and sort rows</a></li>
<li class="chapter" data-level="3.9" data-path="3-wrangling.html"><a href="3-wrangling.html#wrangling-conclusion"><i class="fa fa-check"></i><b>3.9</b> Conclusion</a>
<ul>
<li class="chapter" data-level="3.9.1" data-path="3-wrangling.html"><a href="3-wrangling.html#summary-table-1"><i class="fa fa-check"></i><b>3.9.1</b> Summary table</a></li>
<li class="chapter" data-level="3.9.2" data-path="3-wrangling.html"><a href="3-wrangling.html#additional-resources-2"><i class="fa fa-check"></i><b>3.9.2</b> Additional resources</a></li>
<li class="chapter" data-level="3.9.3" data-path="3-wrangling.html"><a href="3-wrangling.html#whats-to-come-1"><i class="fa fa-check"></i><b>3.9.3</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="4" data-path="4-tidy.html"><a href="4-tidy.html"><i class="fa fa-check"></i><b>4</b> Data Importing and “Tidy” Data</a>
<ul>
<li class="chapter" data-level="" data-path="4-tidy.html"><a href="4-tidy.html#tidy-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="4.1" data-path="4-tidy.html"><a href="4-tidy.html#csv"><i class="fa fa-check"></i><b>4.1</b> Importing data</a>
<ul>
<li class="chapter" data-level="4.1.1" data-path="4-tidy.html"><a href="4-tidy.html#using-the-console"><i class="fa fa-check"></i><b>4.1.1</b> Using the console</a></li>
<li class="chapter" data-level="4.1.2" data-path="4-tidy.html"><a href="4-tidy.html#using-rstudios-interface"><i class="fa fa-check"></i><b>4.1.2</b> Using RStudio’s interface</a></li>
</ul></li>
<li class="chapter" data-level="4.2" data-path="4-tidy.html"><a href="4-tidy.html#tidy-data-ex"><i class="fa fa-check"></i><b>4.2</b> “Tidy” data</a>
<ul>
<li class="chapter" data-level="4.2.1" data-path="4-tidy.html"><a href="4-tidy.html#tidy-definition"><i class="fa fa-check"></i><b>4.2.1</b> Definition of “tidy” data</a></li>
<li class="chapter" data-level="4.2.2" data-path="4-tidy.html"><a href="4-tidy.html#converting-to-tidy-data"><i class="fa fa-check"></i><b>4.2.2</b> Converting to “tidy” data</a></li>
</ul></li>
<li class="chapter" data-level="4.3" data-path="4-tidy.html"><a href="4-tidy.html#case-study-tidy"><i class="fa fa-check"></i><b>4.3</b> Case study: Weight loss data</a></li>
<li class="chapter" data-level="4.4" data-path="4-tidy.html"><a href="4-tidy.html#tidyverse-package"><i class="fa fa-check"></i><b>4.4</b> <code>tidyverse</code> package</a></li>
<li class="chapter" data-level="4.5" data-path="4-tidy.html"><a href="4-tidy.html#tidy-data-conclusion"><i class="fa fa-check"></i><b>4.5</b> Conclusion</a>
<ul>
<li class="chapter" data-level="4.5.1" data-path="4-tidy.html"><a href="4-tidy.html#additional-resources-3"><i class="fa fa-check"></i><b>4.5.1</b> Additional resources</a></li>
<li class="chapter" data-level="4.5.2" data-path="4-tidy.html"><a href="4-tidy.html#whats-to-come-2"><i class="fa fa-check"></i><b>4.5.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>II Data Modeling with moderndive</b></span></li>
<li class="chapter" data-level="5" data-path="5-regression.html"><a href="5-regression.html"><i class="fa fa-check"></i><b>5</b> Basic Regression</a>
<ul>
<li class="chapter" data-level="" data-path="5-regression.html"><a href="5-regression.html#reg-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="5.1" data-path="5-regression.html"><a href="5-regression.html#model1"><i class="fa fa-check"></i><b>5.1</b> One numerical explanatory variable</a>
<ul>
<li class="chapter" data-level="5.1.1" data-path="5-regression.html"><a href="5-regression.html#model1EDA"><i class="fa fa-check"></i><b>5.1.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="5.1.2" data-path="5-regression.html"><a href="5-regression.html#model1table"><i class="fa fa-check"></i><b>5.1.2</b> Simple linear regression</a></li>
<li class="chapter" data-level="5.1.3" data-path="5-regression.html"><a href="5-regression.html#model1points"><i class="fa fa-check"></i><b>5.1.3</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="5.2" data-path="5-regression.html"><a href="5-regression.html#model2"><i class="fa fa-check"></i><b>5.2</b> One categorical explanatory variable</a>
<ul>
<li class="chapter" data-level="5.2.1" data-path="5-regression.html"><a href="5-regression.html#model2EDA"><i class="fa fa-check"></i><b>5.2.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="5.2.2" data-path="5-regression.html"><a href="5-regression.html#model2table"><i class="fa fa-check"></i><b>5.2.2</b> Linear regression</a></li>
<li class="chapter" data-level="5.2.3" data-path="5-regression.html"><a href="5-regression.html#model2points"><i class="fa fa-check"></i><b>5.2.3</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="5.3" data-path="5-regression.html"><a href="5-regression.html#reg-related-topics"><i class="fa fa-check"></i><b>5.3</b> Related topics</a>
<ul>
<li class="chapter" data-level="5.3.1" data-path="5-regression.html"><a href="5-regression.html#correlation-is-not-causation"><i class="fa fa-check"></i><b>5.3.1</b> Correlation is not necessarily causation</a></li>
<li class="chapter" data-level="5.3.2" data-path="5-regression.html"><a href="5-regression.html#leastsquares"><i class="fa fa-check"></i><b>5.3.2</b> Best-fitting line</a></li>
<li class="chapter" data-level="5.3.3" data-path="5-regression.html"><a href="5-regression.html#underthehood"><i class="fa fa-check"></i><b>5.3.3</b> <code>get_regression_x()</code> functions</a></li>
</ul></li>
<li class="chapter" data-level="5.4" data-path="5-regression.html"><a href="5-regression.html#reg-conclusion"><i class="fa fa-check"></i><b>5.4</b> Conclusion</a>
<ul>
<li class="chapter" data-level="5.4.1" data-path="5-regression.html"><a href="5-regression.html#additional-resources-basic-regression"><i class="fa fa-check"></i><b>5.4.1</b> Additional resources</a></li>
<li class="chapter" data-level="5.4.2" data-path="5-regression.html"><a href="5-regression.html#whats-to-come-4"><i class="fa fa-check"></i><b>5.4.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="6" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html"><i class="fa fa-check"></i><b>6</b> Multiple Regression</a>
<ul>
<li class="chapter" data-level="" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#mult-reg-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="6.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4"><i class="fa fa-check"></i><b>6.1</b> One numerical and one categorical explanatory variable</a>
<ul>
<li class="chapter" data-level="6.1.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4EDA"><i class="fa fa-check"></i><b>6.1.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="6.1.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4interactiontable"><i class="fa fa-check"></i><b>6.1.2</b> Interaction model</a></li>
<li class="chapter" data-level="6.1.3" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4table"><i class="fa fa-check"></i><b>6.1.3</b> Parallel slopes model</a></li>
<li class="chapter" data-level="6.1.4" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model4points"><i class="fa fa-check"></i><b>6.1.4</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="6.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3"><i class="fa fa-check"></i><b>6.2</b> Two categorical explanatory variables</a>
<ul>
<li class="chapter" data-level="6.2.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3EDA"><i class="fa fa-check"></i><b>6.2.1</b> Exploratory data analysis</a></li>
<li class="chapter" data-level="6.2.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3table"><i class="fa fa-check"></i><b>6.2.2</b> Regression lines</a></li>
<li class="chapter" data-level="6.2.3" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model3points"><i class="fa fa-check"></i><b>6.2.3</b> Observed/fitted values and residuals</a></li>
</ul></li>
<li class="chapter" data-level="6.3" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#mult-reg-related-topics"><i class="fa fa-check"></i><b>6.3</b> Related topics</a>
<ul>
<li class="chapter" data-level="6.3.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#model-selection"><i class="fa fa-check"></i><b>6.3.1</b> Model selection using visualizations</a></li>
<li class="chapter" data-level="6.3.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#rsquared"><i class="fa fa-check"></i><b>6.3.2</b> Model selection using R-squared</a></li>
</ul></li>
<li class="chapter" data-level="6.4" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#mult-reg-conclusion"><i class="fa fa-check"></i><b>6.4</b> Conclusion</a>
<ul>
<li class="chapter" data-level="6.4.1" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#additional-resources-4"><i class="fa fa-check"></i><b>6.4.1</b> Additional resources</a></li>
<li class="chapter" data-level="6.4.2" data-path="6-multiple-regression.html"><a href="6-multiple-regression.html#whats-to-come-5"><i class="fa fa-check"></i><b>6.4.2</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>III Statistical Inference with infer</b></span></li>
<li class="chapter" data-level="7" data-path="7-sampling.html"><a href="7-sampling.html"><i class="fa fa-check"></i><b>7</b> Sampling</a>
<ul>
<li class="chapter" data-level="" data-path="7-sampling.html"><a href="7-sampling.html#sampling-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="7.1" data-path="7-sampling.html"><a href="7-sampling.html#sampling-activity"><i class="fa fa-check"></i><b>7.1</b> Sampling bowl activity</a>
<ul>
<li class="chapter" data-level="7.1.1" data-path="7-sampling.html"><a href="7-sampling.html#what-proportion-of-this-bowls-balls-are-red"><i class="fa fa-check"></i><b>7.1.1</b> What proportion of this bowl’s balls are red?</a></li>
<li class="chapter" data-level="7.1.2" data-path="7-sampling.html"><a href="7-sampling.html#using-the-shovel-once"><i class="fa fa-check"></i><b>7.1.2</b> Using the shovel once</a></li>
<li class="chapter" data-level="7.1.3" data-path="7-sampling.html"><a href="7-sampling.html#student-shovels"><i class="fa fa-check"></i><b>7.1.3</b> Using the shovel 33 times</a></li>
<li class="chapter" data-level="7.1.4" data-path="7-sampling.html"><a href="7-sampling.html#sampling-what-did-we-just-do"><i class="fa fa-check"></i><b>7.1.4</b> What did we just do?</a></li>
</ul></li>
<li class="chapter" data-level="7.2" data-path="7-sampling.html"><a href="7-sampling.html#sampling-simulation"><i class="fa fa-check"></i><b>7.2</b> Virtual sampling</a>
<ul>
<li class="chapter" data-level="7.2.1" data-path="7-sampling.html"><a href="7-sampling.html#using-the-virtual-shovel-once"><i class="fa fa-check"></i><b>7.2.1</b> Using the virtual shovel once</a></li>
</ul></li>
<li class="chapter" data-level="7.3" data-path="7-sampling.html"><a href="7-sampling.html#sampling-framework"><i class="fa fa-check"></i><b>7.3</b> Sampling framework</a>
<ul>
<li class="chapter" data-level="7.3.1" data-path="7-sampling.html"><a href="7-sampling.html#terminology-and-notation"><i class="fa fa-check"></i><b>7.3.1</b> Terminology and notation</a></li>
<li class="chapter" data-level="7.3.2" data-path="7-sampling.html"><a href="7-sampling.html#sampling-definitions"><i class="fa fa-check"></i><b>7.3.2</b> Statistical definitions</a></li>
<li class="chapter" data-level="7.3.3" data-path="7-sampling.html"><a href="7-sampling.html#moral-of-the-story"><i class="fa fa-check"></i><b>7.3.3</b> The moral of the story</a></li>
</ul></li>
<li class="chapter" data-level="7.4" data-path="7-sampling.html"><a href="7-sampling.html#sampling-case-study"><i class="fa fa-check"></i><b>7.4</b> Case study: Genetic crosses</a></li>
<li class="chapter" data-level="7.5" data-path="7-sampling.html"><a href="7-sampling.html#sampling-conclusion-central-limit-theorem"><i class="fa fa-check"></i><b>7.5</b> Central Limit Theorem</a></li>
<li class="chapter" data-level="7.6" data-path="7-sampling.html"><a href="7-sampling.html#sampling-conclusion"><i class="fa fa-check"></i><b>7.6</b> Conclusion</a>
<ul>
<li class="chapter" data-level="7.6.1" data-path="7-sampling.html"><a href="7-sampling.html#sampling-conclusion-table"><i class="fa fa-check"></i><b>7.6.1</b> Sampling scenarios</a></li>
<li class="chapter" data-level="7.6.2" data-path="7-sampling.html"><a href="7-sampling.html#additional-resources-5"><i class="fa fa-check"></i><b>7.6.2</b> Additional resources</a></li>
<li class="chapter" data-level="7.6.3" data-path="7-sampling.html"><a href="7-sampling.html#whats-to-come-6"><i class="fa fa-check"></i><b>7.6.3</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="8" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html"><i class="fa fa-check"></i><b>8</b> Bootstrapping and Confidence Intervals</a>
<ul>
<li class="chapter" data-level="" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#CI-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="8.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#resampling-tactile"><i class="fa fa-check"></i><b>8.1</b> Pennies activity</a>
<ul>
<li class="chapter" data-level="8.1.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#what-is-the-average-year-on-us-pennies-in-2019"><i class="fa fa-check"></i><b>8.1.1</b> What is the average year on US pennies in 2019?</a></li>
<li class="chapter" data-level="8.1.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#resampling-once"><i class="fa fa-check"></i><b>8.1.2</b> Resampling once</a></li>
<li class="chapter" data-level="8.1.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#student-resamples"><i class="fa fa-check"></i><b>8.1.3</b> Resampling 35 times</a></li>
<li class="chapter" data-level="8.1.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-what-did-we-just-do"><i class="fa fa-check"></i><b>8.1.4</b> What did we just do?</a></li>
</ul></li>
<li class="chapter" data-level="8.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#resampling-simulation"><i class="fa fa-check"></i><b>8.2</b> Computer simulation of resampling</a>
<ul>
<li class="chapter" data-level="8.2.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#virtually-resampling-once"><i class="fa fa-check"></i><b>8.2.1</b> Virtually resampling once</a></li>
<li class="chapter" data-level="8.2.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-35-replicates"><i class="fa fa-check"></i><b>8.2.2</b> Virtually resampling 35 times</a></li>
<li class="chapter" data-level="8.2.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-1000-replicates"><i class="fa fa-check"></i><b>8.2.3</b> Virtually resampling 1000 times</a></li>
</ul></li>
<li class="chapter" data-level="8.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-build-up"><i class="fa fa-check"></i><b>8.3</b> Understanding confidence intervals</a>
<ul>
<li class="chapter" data-level="8.3.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#percentile-method"><i class="fa fa-check"></i><b>8.3.1</b> Percentile method</a></li>
<li class="chapter" data-level="8.3.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#se-method"><i class="fa fa-check"></i><b>8.3.2</b> Standard error method</a></li>
</ul></li>
<li class="chapter" data-level="8.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-process"><i class="fa fa-check"></i><b>8.4</b> Constructing confidence intervals</a>
<ul>
<li class="chapter" data-level="8.4.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#original-workflow"><i class="fa fa-check"></i><b>8.4.1</b> Original workflow</a></li>
<li class="chapter" data-level="8.4.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#infer-workflow"><i class="fa fa-check"></i><b>8.4.2</b> <code>infer</code> package workflow</a></li>
<li class="chapter" data-level="8.4.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#percentile-method-infer"><i class="fa fa-check"></i><b>8.4.3</b> Percentile method with <code>infer</code></a></li>
<li class="chapter" data-level="8.4.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#infer-se"><i class="fa fa-check"></i><b>8.4.4</b> Standard error method with <code>infer</code></a></li>
</ul></li>
<li class="chapter" data-level="8.5" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#one-prop-ci"><i class="fa fa-check"></i><b>8.5</b> Interpreting confidence intervals</a>
<ul>
<li class="chapter" data-level="8.5.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ilyas-yohan"><i class="fa fa-check"></i><b>8.5.1</b> Did the net capture the fish?</a></li>
<li class="chapter" data-level="8.5.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#shorthand"><i class="fa fa-check"></i><b>8.5.2</b> Precise and shorthand interpretation</a></li>
<li class="chapter" data-level="8.5.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-width"><i class="fa fa-check"></i><b>8.5.3</b> Width of confidence intervals</a></li>
</ul></li>
<li class="chapter" data-level="8.6" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#case-study-two-prop-ci"><i class="fa fa-check"></i><b>8.6</b> Case study: Is yawning contagious?</a>
<ul>
<li class="chapter" data-level="8.6.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#mythbusters-study-data"><i class="fa fa-check"></i><b>8.6.1</b> <em>Mythbusters</em> study data</a></li>
<li class="chapter" data-level="8.6.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#sampling-scenario"><i class="fa fa-check"></i><b>8.6.2</b> Sampling scenario</a></li>
<li class="chapter" data-level="8.6.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-build"><i class="fa fa-check"></i><b>8.6.3</b> Constructing the confidence interval</a></li>
<li class="chapter" data-level="8.6.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#interpreting-the-confidence-interval"><i class="fa fa-check"></i><b>8.6.4</b> Interpreting the confidence interval</a></li>
</ul></li>
<li class="chapter" data-level="8.7" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#ci-conclusion"><i class="fa fa-check"></i><b>8.7</b> Conclusion</a>
<ul>
<li class="chapter" data-level="8.7.1" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#bootstrap-vs-sampling"><i class="fa fa-check"></i><b>8.7.1</b> Comparing bootstrap and sampling distributions</a></li>
<li class="chapter" data-level="8.7.2" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#theory-ci"><i class="fa fa-check"></i><b>8.7.2</b> Theory-based confidence intervals</a></li>
<li class="chapter" data-level="8.7.3" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#additional-resources-6"><i class="fa fa-check"></i><b>8.7.3</b> Additional resources</a></li>
<li class="chapter" data-level="8.7.4" data-path="8-confidence-intervals.html"><a href="8-confidence-intervals.html#whats-to-come-7"><i class="fa fa-check"></i><b>8.7.4</b> What’s to come?</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="9" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html"><i class="fa fa-check"></i><b>9</b> Hypothesis Testing</a>
<ul>
<li class="chapter" data-level="" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#nhst-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="9.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-activity"><i class="fa fa-check"></i><b>9.1</b> Promotions activity</a>
<ul>
<li class="chapter" data-level="9.1.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#does-gender-affect-promotions-at-a-bank"><i class="fa fa-check"></i><b>9.1.1</b> Does gender affect promotions at a bank?</a></li>
<li class="chapter" data-level="9.1.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#shuffling-once"><i class="fa fa-check"></i><b>9.1.2</b> Shuffling once</a></li>
<li class="chapter" data-level="9.1.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#shuffling-16-times"><i class="fa fa-check"></i><b>9.1.3</b> Shuffling 16 times</a></li>
<li class="chapter" data-level="9.1.4" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-what-did-we-just-do"><i class="fa fa-check"></i><b>9.1.4</b> What did we just do?</a></li>
</ul></li>
<li class="chapter" data-level="9.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#understanding-ht"><i class="fa fa-check"></i><b>9.2</b> Understanding hypothesis tests</a></li>
<li class="chapter" data-level="9.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-infer"><i class="fa fa-check"></i><b>9.3</b> Conducting hypothesis tests</a>
<ul>
<li class="chapter" data-level="9.3.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#infer-workflow-ht"><i class="fa fa-check"></i><b>9.3.1</b> <code>infer</code> package workflow</a></li>
<li class="chapter" data-level="9.3.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#comparing-infer-workflows"><i class="fa fa-check"></i><b>9.3.2</b> Comparison with confidence intervals</a></li>
<li class="chapter" data-level="9.3.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#only-one-test"><i class="fa fa-check"></i><b>9.3.3</b> “There is only one test”</a></li>
</ul></li>
<li class="chapter" data-level="9.4" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-interpretation"><i class="fa fa-check"></i><b>9.4</b> Interpreting hypothesis tests</a>
<ul>
<li class="chapter" data-level="9.4.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#trial"><i class="fa fa-check"></i><b>9.4.1</b> Two possible outcomes</a></li>
<li class="chapter" data-level="9.4.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#types-of-errors"><i class="fa fa-check"></i><b>9.4.2</b> Types of errors</a></li>
<li class="chapter" data-level="9.4.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#choosing-alpha"><i class="fa fa-check"></i><b>9.4.3</b> How do we choose alpha?</a></li>
</ul></li>
<li class="chapter" data-level="9.5" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#ht-case-study"><i class="fa fa-check"></i><b>9.5</b> Case study: Are action or romance movies rated higher?</a>
<ul>
<li class="chapter" data-level="9.5.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#imdb-data"><i class="fa fa-check"></i><b>9.5.1</b> IMDb ratings data</a></li>
<li class="chapter" data-level="9.5.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#sampling-scenario-1"><i class="fa fa-check"></i><b>9.5.2</b> Sampling scenario</a></li>
<li class="chapter" data-level="9.5.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#conducting-the-hypothesis-test"><i class="fa fa-check"></i><b>9.5.3</b> Conducting the hypothesis test</a></li>
</ul></li>
<li class="chapter" data-level="9.6" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#nhst-conclusion"><i class="fa fa-check"></i><b>9.6</b> Conclusion</a>
<ul>
<li class="chapter" data-level="9.6.1" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#theory-hypo"><i class="fa fa-check"></i><b>9.6.1</b> Theory-based hypothesis tests</a></li>
<li class="chapter" data-level="9.6.2" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#when-inference-is-not-needed"><i class="fa fa-check"></i><b>9.6.2</b> When inference is not needed</a></li>
<li class="chapter" data-level="9.6.3" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#problems-with-p-values"><i class="fa fa-check"></i><b>9.6.3</b> Problems with p-values</a></li>
<li class="chapter" data-level="9.6.4" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#additional-resources-7"><i class="fa fa-check"></i><b>9.6.4</b> Additional resources</a></li>
<li class="chapter" data-level="9.6.5" data-path="9-hypothesis-testing.html"><a href="9-hypothesis-testing.html#whats-to-come-8"><i class="fa fa-check"></i><b>9.6.5</b> What’s to come</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="10" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html"><i class="fa fa-check"></i><b>10</b> Inference for Regression</a>
<ul>
<li class="chapter" data-level="" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#inf-packages"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="10.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-refresher"><i class="fa fa-check"></i><b>10.1</b> Regression refresher</a>
<ul>
<li class="chapter" data-level="10.1.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#teaching-evaluations-analysis"><i class="fa fa-check"></i><b>10.1.1</b> Teaching evaluations analysis</a></li>
<li class="chapter" data-level="10.1.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#sampling-scenario-2"><i class="fa fa-check"></i><b>10.1.2</b> Sampling scenario</a></li>
</ul></li>
<li class="chapter" data-level="10.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-interp"><i class="fa fa-check"></i><b>10.2</b> Interpreting regression tables</a>
<ul>
<li class="chapter" data-level="10.2.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-se"><i class="fa fa-check"></i><b>10.2.1</b> Standard error</a></li>
<li class="chapter" data-level="10.2.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-test-statistic"><i class="fa fa-check"></i><b>10.2.2</b> Test statistic</a></li>
<li class="chapter" data-level="10.2.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#p-value"><i class="fa fa-check"></i><b>10.2.3</b> p-value</a></li>
<li class="chapter" data-level="10.2.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#confidence-interval"><i class="fa fa-check"></i><b>10.2.4</b> Confidence interval</a></li>
<li class="chapter" data-level="10.2.5" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-table-computation"><i class="fa fa-check"></i><b>10.2.5</b> How does R compute the table?</a></li>
</ul></li>
<li class="chapter" data-level="10.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#regression-conditions"><i class="fa fa-check"></i><b>10.3</b> Conditions for inference for regression</a>
<ul>
<li class="chapter" data-level="10.3.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#residuals-refresher"><i class="fa fa-check"></i><b>10.3.1</b> Residuals refresher</a></li>
<li class="chapter" data-level="10.3.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#linearity-of-relationship"><i class="fa fa-check"></i><b>10.3.2</b> Linearity of relationship</a></li>
<li class="chapter" data-level="10.3.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#independence-of-residuals"><i class="fa fa-check"></i><b>10.3.3</b> Independence of residuals</a></li>
<li class="chapter" data-level="10.3.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#normality-of-residuals"><i class="fa fa-check"></i><b>10.3.4</b> Normality of residuals</a></li>
<li class="chapter" data-level="10.3.5" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#equality-of-variance"><i class="fa fa-check"></i><b>10.3.5</b> Equality of variance</a></li>
<li class="chapter" data-level="10.3.6" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#what-is-the-conclusion"><i class="fa fa-check"></i><b>10.3.6</b> What’s the conclusion?</a></li>
</ul></li>
<li class="chapter" data-level="10.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#infer-regression"><i class="fa fa-check"></i><b>10.4</b> Simulation-based inference for regression</a>
<ul>
<li class="chapter" data-level="10.4.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#confidence-interval-for-slope"><i class="fa fa-check"></i><b>10.4.1</b> Confidence interval for slope</a></li>
<li class="chapter" data-level="10.4.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#hypothesis-test-for-slope"><i class="fa fa-check"></i><b>10.4.2</b> Hypothesis test for slope</a></li>
</ul></li>
<li class="chapter" data-level="10.5" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#inference-conclusion"><i class="fa fa-check"></i><b>10.5</b> Conclusion</a>
<ul>
<li class="chapter" data-level="10.5.1" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#theory-regression"><i class="fa fa-check"></i><b>10.5.1</b> Theory-based inference for regression</a></li>
<li class="chapter" data-level="10.5.2" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#summary-of-statistical-inference"><i class="fa fa-check"></i><b>10.5.2</b> Summary of statistical inference</a></li>
<li class="chapter" data-level="10.5.3" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#additional-resources-8"><i class="fa fa-check"></i><b>10.5.3</b> Additional resources</a></li>
<li class="chapter" data-level="10.5.4" data-path="10-inference-for-regression.html"><a href="10-inference-for-regression.html#whats-to-come-9"><i class="fa fa-check"></i><b>10.5.4</b> What’s to come</a></li>
</ul></li>
</ul></li>
<li class="part"><span><b>IV Conclusion</b></span></li>
<li class="chapter" data-level="11" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html"><i class="fa fa-check"></i><b>11</b> Tell Your Story with Data</a>
<ul>
<li class="chapter" data-level="11.1" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#review"><i class="fa fa-check"></i><b>11.1</b> Review</a>
<ul>
<li class="chapter" data-level="" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#story-packages"><i class="fa fa-check"></i>Needed packages</a></li>
</ul></li>
<li class="chapter" data-level="11.2" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#seattle-house-prices"><i class="fa fa-check"></i><b>11.2</b> Case study: Seattle house prices</a>
<ul>
<li class="chapter" data-level="11.2.1" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-EDA-I"><i class="fa fa-check"></i><b>11.2.1</b> Exploratory data analysis: Part I</a></li>
<li class="chapter" data-level="11.2.2" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-EDA-II"><i class="fa fa-check"></i><b>11.2.2</b> Exploratory data analysis: Part II</a></li>
<li class="chapter" data-level="11.2.3" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-regression"><i class="fa fa-check"></i><b>11.2.3</b> Regression modeling</a></li>
<li class="chapter" data-level="11.2.4" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#house-prices-making-predictions"><i class="fa fa-check"></i><b>11.2.4</b> Making predictions</a></li>
</ul></li>
<li class="chapter" data-level="11.3" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#data-journalism"><i class="fa fa-check"></i><b>11.3</b> Case study: Effective data storytelling</a>
<ul>
<li class="chapter" data-level="11.3.1" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#bechdel-test-for-hollywood-gender-representation"><i class="fa fa-check"></i><b>11.3.1</b> Bechdel test for Hollywood gender representation</a></li>
<li class="chapter" data-level="11.3.2" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#us-births-in-1999"><i class="fa fa-check"></i><b>11.3.2</b> US Births in 1999</a></li>
<li class="chapter" data-level="11.3.3" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#scripts-of-r-code"><i class="fa fa-check"></i><b>11.3.3</b> Scripts of R code</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="11-thinking-with-data.html"><a href="11-thinking-with-data.html#concluding-remarks"><i class="fa fa-check"></i>Concluding remarks</a></li>
</ul></li>
<li class="appendix"><span><b>Appendix</b></span></li>
<li class="chapter" data-level="A" data-path="A-appendixA.html"><a href="A-appendixA.html"><i class="fa fa-check"></i><b>A</b> Statistical Background</a>
<ul>
<li class="chapter" data-level="A.1" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-stat-terms"><i class="fa fa-check"></i><b>A.1</b> Basic statistical terms</a>
<ul>
<li class="chapter" data-level="A.1.1" data-path="A-appendixA.html"><a href="A-appendixA.html#mean"><i class="fa fa-check"></i><b>A.1.1</b> Mean</a></li>
<li class="chapter" data-level="A.1.2" data-path="A-appendixA.html"><a href="A-appendixA.html#median"><i class="fa fa-check"></i><b>A.1.2</b> Median</a></li>
<li class="chapter" data-level="A.1.3" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-sd-variance"><i class="fa fa-check"></i><b>A.1.3</b> Standard deviation and variance</a></li>
<li class="chapter" data-level="A.1.4" data-path="A-appendixA.html"><a href="A-appendixA.html#five-number-summary"><i class="fa fa-check"></i><b>A.1.4</b> Five-number summary</a></li>
<li class="chapter" data-level="A.1.5" data-path="A-appendixA.html"><a href="A-appendixA.html#distribution"><i class="fa fa-check"></i><b>A.1.5</b> Distribution</a></li>
<li class="chapter" data-level="A.1.6" data-path="A-appendixA.html"><a href="A-appendixA.html#outliers"><i class="fa fa-check"></i><b>A.1.6</b> Outliers</a></li>
</ul></li>
<li class="chapter" data-level="A.2" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-normal-curve"><i class="fa fa-check"></i><b>A.2</b> Normal distribution</a></li>
<li class="chapter" data-level="A.3" data-path="A-appendixA.html"><a href="A-appendixA.html#appendix-log10-transformations"><i class="fa fa-check"></i><b>A.3</b> log10 transformations</a></li>
</ul></li>
<li class="chapter" data-level="B" data-path="B-appendixB.html"><a href="B-appendixB.html"><i class="fa fa-check"></i><b>B</b> Inference Examples</a>
<ul>
<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#needed-packages-1"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="B.1" data-path="B-appendixB.html"><a href="B-appendixB.html#inference-mind-map"><i class="fa fa-check"></i><b>B.1</b> Inference mind map</a></li>
<li class="chapter" data-level="B.2" data-path="B-appendixB.html"><a href="B-appendixB.html#one-mean"><i class="fa fa-check"></i><b>B.2</b> One mean</a>
<ul>
<li class="chapter" data-level="B.2.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement"><i class="fa fa-check"></i><b>B.2.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.2.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses"><i class="fa fa-check"></i><b>B.2.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.2.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data"><i class="fa fa-check"></i><b>B.2.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.2.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods"><i class="fa fa-check"></i><b>B.2.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.2.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods"><i class="fa fa-check"></i><b>B.2.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.2.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results"><i class="fa fa-check"></i><b>B.2.6</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.3" data-path="B-appendixB.html"><a href="B-appendixB.html#one-proportion"><i class="fa fa-check"></i><b>B.3</b> One proportion</a>
<ul>
<li class="chapter" data-level="B.3.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-1"><i class="fa fa-check"></i><b>B.3.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.3.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-1"><i class="fa fa-check"></i><b>B.3.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.3.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-1"><i class="fa fa-check"></i><b>B.3.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.3.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-1"><i class="fa fa-check"></i><b>B.3.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.3.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-1"><i class="fa fa-check"></i><b>B.3.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.3.6" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-1"><i class="fa fa-check"></i><b>B.3.6</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.4" data-path="B-appendixB.html"><a href="B-appendixB.html#two-proportions"><i class="fa fa-check"></i><b>B.4</b> Two proportions</a>
<ul>
<li class="chapter" data-level="B.4.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-2"><i class="fa fa-check"></i><b>B.4.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.4.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-2"><i class="fa fa-check"></i><b>B.4.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.4.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-2"><i class="fa fa-check"></i><b>B.4.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.4.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-2"><i class="fa fa-check"></i><b>B.4.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.4.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-2"><i class="fa fa-check"></i><b>B.4.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.4.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-2"><i class="fa fa-check"></i><b>B.4.6</b> Test statistic</a></li>
<li class="chapter" data-level="B.4.7" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-2"><i class="fa fa-check"></i><b>B.4.7</b> State conclusion</a></li>
<li class="chapter" data-level="B.4.8" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-2"><i class="fa fa-check"></i><b>B.4.8</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.5" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-independent-samples"><i class="fa fa-check"></i><b>B.5</b> Two means (independent samples)</a>
<ul>
<li class="chapter" data-level="B.5.1" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-3"><i class="fa fa-check"></i><b>B.5.1</b> Problem statement</a></li>
<li class="chapter" data-level="B.5.2" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-3"><i class="fa fa-check"></i><b>B.5.2</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.5.3" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-3"><i class="fa fa-check"></i><b>B.5.3</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.5.4" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-3"><i class="fa fa-check"></i><b>B.5.4</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.5.5" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-3"><i class="fa fa-check"></i><b>B.5.5</b> Traditional methods</a></li>
<li class="chapter" data-level="B.5.6" data-path="B-appendixB.html"><a href="B-appendixB.html#test-statistic-3"><i class="fa fa-check"></i><b>B.5.6</b> Test statistic</a></li>
<li class="chapter" data-level="B.5.7" data-path="B-appendixB.html"><a href="B-appendixB.html#compute-p-value-1"><i class="fa fa-check"></i><b>B.5.7</b> Compute <span class="math inline">\(p\)</span>-value</a></li>
<li class="chapter" data-level="B.5.8" data-path="B-appendixB.html"><a href="B-appendixB.html#state-conclusion-3"><i class="fa fa-check"></i><b>B.5.8</b> State conclusion</a></li>
<li class="chapter" data-level="B.5.9" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-3"><i class="fa fa-check"></i><b>B.5.9</b> Comparing results</a></li>
</ul></li>
<li class="chapter" data-level="B.6" data-path="B-appendixB.html"><a href="B-appendixB.html#two-means-paired-samples"><i class="fa fa-check"></i><b>B.6</b> Two means (paired samples)</a>
<ul>
<li class="chapter" data-level="" data-path="B-appendixB.html"><a href="B-appendixB.html#problem-statement-4"><i class="fa fa-check"></i>Problem statement</a></li>
<li class="chapter" data-level="B.6.1" data-path="B-appendixB.html"><a href="B-appendixB.html#competing-hypotheses-4"><i class="fa fa-check"></i><b>B.6.1</b> Competing hypotheses</a></li>
<li class="chapter" data-level="B.6.2" data-path="B-appendixB.html"><a href="B-appendixB.html#exploring-the-sample-data-4"><i class="fa fa-check"></i><b>B.6.2</b> Exploring the sample data</a></li>
<li class="chapter" data-level="B.6.3" data-path="B-appendixB.html"><a href="B-appendixB.html#non-traditional-methods-4"><i class="fa fa-check"></i><b>B.6.3</b> Non-traditional methods</a></li>
<li class="chapter" data-level="B.6.4" data-path="B-appendixB.html"><a href="B-appendixB.html#traditional-methods-4"><i class="fa fa-check"></i><b>B.6.4</b> Traditional methods</a></li>
<li class="chapter" data-level="B.6.5" data-path="B-appendixB.html"><a href="B-appendixB.html#comparing-results-4"><i class="fa fa-check"></i><b>B.6.5</b> Comparing results</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="C" data-path="C-appendixC.html"><a href="C-appendixC.html"><i class="fa fa-check"></i><b>C</b> Tips and Tricks</a>
<ul>
<li class="chapter" data-level="" data-path="C-appendixC.html"><a href="C-appendixC.html#needed-packages-2"><i class="fa fa-check"></i>Needed packages</a></li>
<li class="chapter" data-level="C.1" data-path="C-appendixC.html"><a href="C-appendixC.html#data-wrangling"><i class="fa fa-check"></i><b>C.1</b> Data wrangling</a>
<ul>
<li class="chapter" data-level="C.1.1" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-missing-values"><i class="fa fa-check"></i><b>C.1.1</b> Dealing with missing values</a></li>
<li class="chapter" data-level="C.1.2" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-reordering-bars"><i class="fa fa-check"></i><b>C.1.2</b> Reordering bars in a barplot</a></li>
<li class="chapter" data-level="C.1.3" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-money-on-axis"><i class="fa fa-check"></i><b>C.1.3</b> Showing money on an axis</a></li>
<li class="chapter" data-level="C.1.4" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-changing-values"><i class="fa fa-check"></i><b>C.1.4</b> Changing values inside cells</a></li>
<li class="chapter" data-level="C.1.5" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-convert-numerical-categorical"><i class="fa fa-check"></i><b>C.1.5</b> Converting a numerical variable to a categorical one</a></li>
<li class="chapter" data-level="C.1.6" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-prop"><i class="fa fa-check"></i><b>C.1.6</b> Computing proportions</a></li>
<li class="chapter" data-level="C.1.7" data-path="C-appendixC.html"><a href="C-appendixC.html#appendix-commas"><i class="fa fa-check"></i><b>C.1.7</b> Dealing with %, commas, and $</a></li>
</ul></li>
<li class="chapter" data-level="C.2" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-graphics"><i class="fa fa-check"></i><b>C.2</b> Interactive graphics</a>
<ul>
<li class="chapter" data-level="C.2.1" data-path="C-appendixC.html"><a href="C-appendixC.html#interactive-linegraphs"><i class="fa fa-check"></i><b>C.2.1</b> Interactive linegraphs</a></li>
</ul></li>
</ul></li>
<li class="chapter" data-level="D" data-path="D-appendixD.html"><a href="D-appendixD.html"><i class="fa fa-check"></i><b>D</b> Learning Check Solutions</a>
<ul>
<li class="chapter" data-level="D.1" data-path="D-appendixD.html"><a href="D-appendixD.html#chapter-1-solutions"><i class="fa fa-check"></i><b>D.1</b> Chapter 1 Solutions</a></li>
</ul></li>
<li class="chapter" data-level="E" data-path="E-appendixE.html"><a href="E-appendixE.html"><i class="fa fa-check"></i><b>E</b> Versions of R Packages Used</a></li>
<li class="chapter" data-level="" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i>References</a></li>
</ul>
</nav>
</div>
<div class="book-body">
<div class="body-inner">
<div class="book-header" role="navigation">
<h1>
<i class="fa fa-circle-o-notch fa-spin"></i><a href="./">Modern Biological Data Analysis</a>
</h1>
</div>
<div class="page-wrapper" tabindex="-1" role="main">
<div class="page-inner">
<section class="normal" id="section-">
<html>
<img src='https://moderndive.com/wide_format.png' alt="ModernDive">
</html>
<div id="regression" class="section level1" number="5">
<h1><span class="header-section-number">Chapter 5</span> Basic Regression</h1>
<p>Now that we are equipped with data visualization skills from Chapter <a href="2-viz.html#viz">2</a>, data wrangling skills from Chapter <a href="3-wrangling.html#wrangling">3</a>, and an understanding of how to import data and the concept of a “tidy” data format from Chapter <a href="4-tidy.html#tidy">4</a>, let’s now proceed with data modeling. The fundamental premise of data modeling is to make explicit the relationship between:</p>
<ul>
<li>an <em>outcome variable</em> <span class="math inline">\(y\)</span>, also called a <em>dependent variable</em> or response variable, and</li>
<li>an <em>explanatory/predictor variable</em> <span class="math inline">\(x\)</span>, also called an <em>independent variable</em> or covariate.</li>
</ul>
<p>Another way to state this is using mathematical terminology: we will model the outcome variable <span class="math inline">\(y\)</span> “as a function” of the explanatory/predictor variable <span class="math inline">\(x\)</span>. When we say “function” here, we aren’t referring to functions in R like the <code>ggplot()</code> function, but rather as a mathematical function. But, why do we have two different labels, explanatory and predictor, for the variable <span class="math inline">\(x\)</span>? That’s because even though the two terms are often used interchangeably, roughly speaking data modeling serves one of two purposes:</p>
<ol style="list-style-type: decimal">
<li><strong>Modeling for explanation</strong>: When you want to explicitly describe and quantify the relationship between the outcome variable <span class="math inline">\(y\)</span> and a set of explanatory variables <span class="math inline">\(x\)</span>, determine the significance of any relationships, have measures summarizing these relationships, and possibly identify any <em>causal</em> relationships between the variables.</li>
<li><strong>Modeling for prediction</strong>: When you want to predict an outcome variable <span class="math inline">\(y\)</span> based on the information contained in a set of predictor variables <span class="math inline">\(x\)</span>. Unlike modeling for explanation, however, you don’t care so much about understanding how all the variables relate and interact with one another, but rather only whether you can make good predictions about <span class="math inline">\(y\)</span> using the information in <span class="math inline">\(x\)</span>.</li>
</ol>
<p>For example, say you are interested in an outcome variable <span class="math inline">\(y\)</span> of whether patients develop lung cancer and information <span class="math inline">\(x\)</span> on their risk factors, such as smoking habits, age, and socioeconomic status. If we are modeling for explanation, we would be interested in both describing and quantifying the effects of the different risk factors. One reason could be that you want to design an intervention to reduce lung cancer incidence in a population, such as targeting smokers of a specific age group with advertising for smoking cessation programs. If we are modeling for prediction, however, we wouldn’t care so much about understanding how all the individual risk factors contribute to lung cancer, but rather only whether we can make good predictions of which people will contract lung cancer.</p>
<p>In this book, we’ll focus on modeling for explanation and hence refer to <span class="math inline">\(x\)</span> as <em>explanatory variables</em>. If you are interested in learning about modeling for prediction, we suggest you check out books and courses on the field of <em>machine learning</em> such as <a href="http://www-bcf.usc.edu/~gareth/ISL/"><em>An Introduction to Statistical Learning with Applications in R (ISLR)</em></a> <span class="citation">(<a href="#ref-islr2017" role="doc-biblioref">James et al. 2017</a>)</span>. Furthermore, while there exist many techniques for modeling, such as tree-based models and neural networks, in this book we’ll focus on one particular technique: <em>linear regression</em>. Linear regression is one of the most commonly-used and easy-to-understand approaches to modeling.</p>
<p>Linear regression involves a <em>numerical</em> outcome variable <span class="math inline">\(y\)</span> and explanatory variables <span class="math inline">\(x\)</span> that are either <em>numerical</em> or <em>categorical</em>. Furthermore, the relationship between <span class="math inline">\(y\)</span> and <span class="math inline">\(x\)</span> is assumed to be linear, or in other words, a line. However, we’ll see that what constitutes a “line” will vary depending on the nature of your explanatory variables <span class="math inline">\(x\)</span>.</p>
<p>In Chapter <a href="5-regression.html#regression">5</a> on basic regression, we’ll only consider models with a single explanatory variable <span class="math inline">\(x\)</span>. In Section <a href="5-regression.html#model1">5.1</a>, the explanatory variable will be numerical. This scenario is known as <em>simple linear regression</em>. In Section <a href="5-regression.html#model2">5.2</a>, the explanatory variable will be categorical.</p>
<p>In Chapter <a href="6-multiple-regression.html#multiple-regression">6</a> on multiple regression, we’ll extend the ideas behind basic regression and consider models with two explanatory variables <span class="math inline">\(x_1\)</span> and <span class="math inline">\(x_2\)</span>. In Section <a href="6-multiple-regression.html#model4">6.1</a>, we’ll have two numerical explanatory variables. In Section <a href="6-multiple-regression.html#model3">6.2</a>, we’ll have one numerical and one categorical explanatory variable. In particular, we’ll consider two such models: <em>interaction</em> and <em>parallel slopes</em> models.</p>
<p>In Chapter <a href="10-inference-for-regression.html#inference-for-regression">10</a> on inference for regression, we’ll revisit our regression models and analyze the results using the tools for <em>statistical inference</em> you’ll develop in Chapters <a href="7-sampling.html#sampling">7</a>, <a href="8-confidence-intervals.html#confidence-intervals">8</a>, and <a href="9-hypothesis-testing.html#hypothesis-testing">9</a> on sampling, bootstrapping and confidence intervals, and hypothesis testing and <span class="math inline">\(p\)</span>-values, respectively.</p>
<p>Let’s now begin with basic regression, which refers to linear regression models with a single explanatory variable <span class="math inline">\(x\)</span>. We’ll also discuss important statistical concepts like the <em>correlation coefficient</em>, that “correlation isn’t necessarily causation,” and what it means for a line to be “best-fitting.”</p>
<div id="reg-packages" class="section level3 unnumbered">
<h3>Needed packages</h3>
<p>Let’s now load all the packages needed for this chapter (this assumes you’ve already installed them). In this chapter, we introduce some new packages:</p>
<ol style="list-style-type: decimal">
<li>The <code>tidyverse</code> “umbrella” <span class="citation">(<a href="#ref-R-tidyverse" role="doc-biblioref">Wickham 2021b</a>)</span> package. Recall from our discussion in Section <a href="4-tidy.html#tidyverse-package">4.4</a> that loading the <code>tidyverse</code> package by running <code>library(tidyverse)</code> loads the following commonly used data science packages all at once:
<ul>
<li><code>ggplot2</code> for data visualization</li>
<li><code>dplyr</code> for data wrangling</li>
<li><code>tidyr</code> for converting data to “tidy” format</li>
<li><code>readr</code> for importing spreadsheet data into R</li>
<li>As well as the more advanced <code>purrr</code>, <code>tibble</code>, <code>stringr</code>, and <code>forcats</code> packages</li>
</ul></li>
<li>The <code>moderndive</code> package of datasets and functions for tidyverse-friendly introductory linear regression.</li>
<li>The <code>skimr</code> <span class="citation">(<a href="#ref-R-skimr" role="doc-biblioref">Waring et al. 2021</a>)</span> package, which provides a simple-to-use function to quickly compute a wide array of commonly used summary statistics. </li>
</ol>
<p>If needed, read Section <a href="1-getting-started.html#packages">1.3</a> for information on how to install and load R packages.</p>
<div class="sourceCode" id="cb137"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb137-1"><a href="5-regression.html#cb137-1" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(tidyverse)</span>
<span id="cb137-2"><a href="5-regression.html#cb137-2" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(moderndive)</span>
<span id="cb137-3"><a href="5-regression.html#cb137-3" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(skimr)</span>
<span id="cb137-4"><a href="5-regression.html#cb137-4" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(abd)</span>
<span id="cb137-5"><a href="5-regression.html#cb137-5" aria-hidden="true" tabindex="-1"></a><span class="fu">library</span>(gapminder)</span></code></pre></div>
</div>
<div id="model1" class="section level2" number="5.1">
<h2><span class="header-section-number">5.1</span> One numerical explanatory variable</h2>
<p>Determining the age of lions living in the wild can be difficult to determine? One hypothesis is that the amount of black pigment in the nose increases with age and that therefore the proportion of black in the nose can be used to estimate a lion’s age. Here we’ll look at the <code>LionNoses</code> data set in the <code>abd</code> package. More information, including a reference to the original study, can be found by typing <code>?LionNoses</code> in the console to view its help file.</p>
<p>Researchers at the University of Minnesota tried to answer the following research question: can the age of male lions be used to predict the proportion of black in the nose or is there no relation? To this end, they used age and nose coloration data from 32 lions. We’ll answer these questions by modeling the relationship between proportion black and age using <em>simple linear regression</em> where we have:</p>
<ol style="list-style-type: decimal">
<li>A numerical outcome variable <span class="math inline">\(y\)</span> (the relative nose coloration) and</li>
<li>A single numerical explanatory variable <span class="math inline">\(x\)</span> (the lion’s age).</li>
</ol>
<div id="model1EDA" class="section level3" number="5.1.1">
<h3><span class="header-section-number">5.1.1</span> Exploratory data analysis</h3>
<p>The data on the 32 male lions can be found in the <code>LionNoses</code> data frame included in the <code>abd</code> package.</p>
<div class="sourceCode" id="cb138"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb138-1"><a href="5-regression.html#cb138-1" aria-hidden="true" tabindex="-1"></a>LionNoses</span></code></pre></div>
<pre><code> age proportion.black
1 1.1 0.21
2 1.5 0.14
3 1.9 0.11
4 2.2 0.13
5 2.6 0.12
6 3.2 0.13
7 3.2 0.12
8 2.9 0.18
9 2.4 0.23
10 2.1 0.22
11 1.9 0.20
12 1.9 0.17
13 1.9 0.15
14 1.9 0.27
15 2.8 0.26
16 3.6 0.21
17 4.3 0.30
18 3.8 0.42
19 4.2 0.43
20 5.4 0.59
21 5.8 0.60
22 6.0 0.72
23 3.4 0.29
24 4.0 0.10
25 7.3 0.48
26 7.3 0.44
27 7.8 0.34
28 7.1 0.37
29 7.1 0.34
30 13.1 0.74
31 8.8 0.79
32 5.4 0.51</code></pre>
<p>A crucial step before doing any kind of analysis or modeling is performing an <em>exploratory data analysis</em>, or EDA for short. EDA gives you a sense of the distributions of the individual variables in your data, whether any potential relationships exist between variables, whether there are outliers and/or missing values, and (most importantly) how to build your model. Here are three common steps in an EDA:</p>
<ol style="list-style-type: decimal">
<li>Most crucially, looking at the raw data values.</li>
<li>Computing summary statistics, such as means, medians, and interquartile ranges.</li>
<li>Creating data visualizations.</li>
</ol>
<p>Let’s perform the first common step in an exploratory data analysis: looking at the raw data values. Because this step seems so trivial, unfortunately many data analysts ignore it. However, getting an early sense of what your raw data looks like can often prevent many larger issues down the road.</p>
<p>You can do this by using RStudio’s spreadsheet viewer or by using the <code>glimpse()</code> function as introduced in Subsection <a href="1-getting-started.html#exploredataframes">1.4.3</a> on exploring data frames:</p>
<div class="sourceCode" id="cb140"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb140-1"><a href="5-regression.html#cb140-1" aria-hidden="true" tabindex="-1"></a><span class="fu">glimpse</span>(LionNoses)</span></code></pre></div>
<pre><code>Rows: 32
Columns: 2
$ age <dbl> 1.1, 1.5, 1.9, 2.2, 2.6, 3.2, 3.2, 2.9, 2.4, 2.1, 1.9…
$ proportion.black <dbl> 0.21, 0.14, 0.11, 0.13, 0.12, 0.13, 0.12, 0.18, 0.23,…</code></pre>
<p>Let’s fully describe the variables in <code>LionNoses</code>:</p>
<ol style="list-style-type: decimal">
<li><code>proportion.black</code>: A numerical variable of the relative coloration of the nose. This is the outcome variable <span class="math inline">\(y\)</span> of interest.</li>
<li><code>age</code>: A numerical variable of the male lion’s age. This will be the explanatory variable <span class="math inline">\(x\)</span>.</li>
</ol>
<p>An alternative way to look at the raw data values is by choosing a random sample of the rows in <code>LionNoses</code> by piping it into the <code>sample_n()</code> function from the <code>dplyr</code> package. Here we set the <code>size</code> argument to be <code>5</code>, indicating that we want a random sample of 5 rows. We display the results in Table <a href="5-regression.html#tab:five-random-courses">5.1</a>. Note that due to the random nature of the sampling, you will likely end up with a different subset of 5 rows.</p>
<div class="sourceCode" id="cb142"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb142-1"><a href="5-regression.html#cb142-1" aria-hidden="true" tabindex="-1"></a>LionNoses <span class="sc">%>%</span></span>
<span id="cb142-2"><a href="5-regression.html#cb142-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">sample_n</span>(<span class="at">size =</span> <span class="dv">5</span>)</span></code></pre></div>
<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">
<span id="tab:five-random-courses">TABLE 5.1: </span>A random sample of 5 out of the 32 lions
</caption>
<thead>
<tr>
<th style="text-align:right;">
age
</th>
<th style="text-align:right;">
proportion.black
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:right;">
2.6
</td>
<td style="text-align:right;">
0.12
</td>
</tr>
<tr>
<td style="text-align:right;">
1.1
</td>
<td style="text-align:right;">
0.21
</td>
</tr>
<tr>
<td style="text-align:right;">
1.9
</td>
<td style="text-align:right;">
0.15
</td>
</tr>
<tr>
<td style="text-align:right;">
7.1
</td>
<td style="text-align:right;">
0.37
</td>
</tr>
<tr>
<td style="text-align:right;">
3.8
</td>
<td style="text-align:right;">
0.42
</td>
</tr>
</tbody>
</table>
<p>Now that we’ve looked at the raw values in our <code>LionNoses</code> data frame and got a preliminary sense of the data, let’s move on to the next common step in an exploratory data analysis: computing summary statistics. Let’s start by computing the mean and median of our numerical outcome variable denoted as <code>proportion.black</code> and our numerical explanatory variable <code>age</code>. We’ll do this by using the <code>summarize()</code> function from <code>dplyr</code> along with the <code>mean()</code> and <code>median()</code> summary functions we saw in Section <a href="3-wrangling.html#summarize">3.5</a>.</p>
<div class="sourceCode" id="cb143"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb143-1"><a href="5-regression.html#cb143-1" aria-hidden="true" tabindex="-1"></a>LionNoses <span class="sc">%>%</span></span>
<span id="cb143-2"><a href="5-regression.html#cb143-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarize</span>(<span class="at">mean_age =</span> <span class="fu">mean</span>(age), <span class="at">mean_proportion.black =</span> <span class="fu">mean</span>(proportion.black),</span>
<span id="cb143-3"><a href="5-regression.html#cb143-3" aria-hidden="true" tabindex="-1"></a> <span class="at">median_age =</span> <span class="fu">median</span>(age), <span class="at">median_proportion.black =</span> <span class="fu">median</span>(proportion.black))</span></code></pre></div>
<pre><code> mean_age mean_proportion.black median_age median_proportion.black
1 4.31 0.322 3.5 0.265</code></pre>
<p>However, what if we want other summary statistics as well, such as the standard deviation (a measure of spread), the minimum and maximum values, and various percentiles?</p>
<p>Typing out all these summary statistic functions in <code>summarize()</code> would be long and tedious. Instead, let’s use the convenient <code>skim()</code> function from the <code>skimr</code> package. This function takes in a data frame, “skims” it, and returns commonly used summary statistics. Let’s take the <code>LionNoses</code> data frame, with the outcome and explanatory variables <code>proportion.black</code> and <code>age</code>, and pipe them into the <code>skim()</code> function:</p>
<div class="sourceCode" id="cb145"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb145-1"><a href="5-regression.html#cb145-1" aria-hidden="true" tabindex="-1"></a>LionNoses <span class="sc">%>%</span> <span class="fu">skim</span>()</span></code></pre></div>
<!--
TODO:
Update skimr::skim() output to match v2.0.1
Skipped: Couldn't figure out how to use skim_with(ts = sfl(line_graph = NULL))
at https://cran.r-project.org/web/packages/skimr/vignettes/skimr.html
Used remotes::install_version("skimr", version = "1.0.6") to use that version
instead.
-->
<p>(For formatting purposes in this book, the inline histogram that is usually printed with <code>skim()</code> has been removed. This can be done by using <code>skim_with(numeric = list(hist = NULL))</code> prior to using the <code>skim()</code> function for version 1.0.6 of <code>skimr</code>.)</p>
<p>For the numerical variables <code>age</code> and <code>proportion.black</code> it returns:</p>
<ul>
<li><code>n_missing</code>: the number of missing values</li>
<li><code>complete_rate</code>: the proportion of complete values</li>
<li><code>mean</code>: the average</li>
<li><code>sd</code>: the standard deviation</li>
<li><code>p0</code>: the 0th percentile: the value at which 0% of observations are smaller than it (the <em>minimum</em> value)</li>
<li><code>p25</code>: the 25th percentile: the value at which 25% of observations are smaller than it (the <em>1st quartile</em>)</li>
<li><code>p50</code>: the 50th percentile: the value at which 50% of observations are smaller than it (the <em>2nd</em> quartile and more commonly called the <em>median</em>)</li>
<li><code>p75</code>: the 75th percentile: the value at which 75% of observations are smaller than it (the <em>3rd quartile</em>)</li>
<li><code>p100</code>: the 100th percentile: the value at which 100% of observations are smaller than it (the <em>maximum</em> value)</li>
</ul>
<p>Looking at this output, we can see how the values of both variables distribute. For example, the mean nose coloration was 0.322 black, whereas the mean “age” was 4.31 years. Furthermore, the middle 50% of nose coloration was between 0.165 and 0.433 (the first and third quartiles), whereas the middle 50% of age falls within 2.18 to 5.85 years.</p>
<p>The <code>skim()</code> function only returns what are known as <em>univariate</em> summary statistics: functions that take a single variable and return some numerical summary of that variable. However, there also exist <em>bivariate</em> summary statistics: functions that take in two variables and return some summary of those two variables. In particular, when the two variables are numerical, we can compute the <em>correlation coefficient</em>. Generally speaking, <em>coefficients</em> are quantitative expressions of a specific phenomenon. A <em>correlation coefficient</em> is a quantitative expression of the <em>strength of the linear relationship between two numerical variables</em>. Its value ranges between -1 and 1 where:</p>
<ul>
<li>-1 indicates a perfect <em>negative relationship</em>: As one variable increases, the value of the other variable tends to go down, following a straight line.</li>
<li>0 indicates no relationship: The values of both variables go up/down independently of each other.</li>
<li>+1 indicates a perfect <em>positive relationship</em>: As the value of one variable goes up, the value of the other variable tends to go up as well in a linear fashion.</li>
</ul>
<p>Figure <a href="5-regression.html#fig:correlation1">5.1</a> gives examples of 9 different correlation coefficient values for hypothetical numerical variables <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>. For example, observe in the top right plot that for a correlation coefficient of -0.75 there is a negative linear relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span>, but it is not as strong as the negative linear relationship between <span class="math inline">\(x\)</span> and <span class="math inline">\(y\)</span> when the correlation coefficient is -0.9 or -1.</p>
<div class="figure" style="text-align: center"><span style="display:block;" id="fig:correlation1"></span>
<img src="ModernDive_files/figure-html/correlation1-1.png" alt="Nine different correlation coefficients." width="\textwidth" />
<p class="caption">
FIGURE 5.1: Nine different correlation coefficients.
</p>
</div>
<p>The correlation coefficient can be computed using the <code>get_correlation()</code> function in the <code>moderndive</code> package. In this case, the inputs to the function are the two numerical variables for which we want to calculate the correlation coefficient.</p>
<p>We put the name of the outcome variable on the left-hand side of the <code>~</code> “tilde” sign, while putting the name of the explanatory variable on the right-hand side. This is known as R’s <em>formula notation</em>. We will use this same “formula” syntax with regression later in this chapter.</p>
<div class="sourceCode" id="cb146"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb146-1"><a href="5-regression.html#cb146-1" aria-hidden="true" tabindex="-1"></a>LionNoses <span class="sc">%>%</span> </span>
<span id="cb146-2"><a href="5-regression.html#cb146-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">get_correlation</span>(<span class="at">formula =</span> proportion.black <span class="sc">~</span> age)</span></code></pre></div>
<pre><code> cor
1 0.79</code></pre>
<p>An alternative way to compute correlation is to use the <code>cor()</code> summary function within a <code>summarize()</code> command:</p>
<div class="sourceCode" id="cb148"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb148-1"><a href="5-regression.html#cb148-1" aria-hidden="true" tabindex="-1"></a>LionNoses <span class="sc">%>%</span> </span>
<span id="cb148-2"><a href="5-regression.html#cb148-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">summarize</span>(<span class="at">correlation =</span> <span class="fu">cor</span>(proportion.black, age))</span></code></pre></div>
<p>In our case, the correlation coefficient of 0.79 indicates that the relationship between proportion black and age is “clearly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to the extreme values of -1, 0, and 1. To develop your intuition about correlation coefficients, play the “Guess the Correlation” 1980’s style video game mentioned in Subsection <a href="5-regression.html#additional-resources-basic-regression">5.4.1</a>.</p>
<p>Let’s now perform the last of the steps in an exploratory data analysis: creating data visualizations. Since both the <code>proportion.black</code> and <code>age</code> variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using <code>geom_point()</code> and display the result in Figure <a href="5-regression.html#fig:numxplot1">5.2</a>.</p>
<div class="figure" style="text-align: center"><span style="display:block;" id="fig:numxplot1"></span>
<img src="ModernDive_files/figure-html/numxplot1-1.png" alt="Lion age and nose coloration." width="\textwidth" />
<p class="caption">
FIGURE 5.2: Lion age and nose coloration.
</p>
</div>
<p>Observe that most ages lie between 1 and 5 years, while most nose coloration proportions lie between 0.1 and 0.3 black. Furthermore, while opinions may vary, it is our opinion that the relationship between proportion black and age is “clearly positive.” This is consistent with our earlier computed correlation coefficient of 0.79.</p>
<p>Let’s build on the scatterplot in Figure <a href="5-regression.html#fig:numxplot1">5.2</a> by adding a “best-fitting” line: of all possible lines we can draw on this scatterplot, it is the line that “best” fits through the cloud of points. We do this by adding a new <code>geom_smooth(method = "lm", se = FALSE)</code> layer to the <code>ggplot()</code> code that created the scatterplot in Figure <a href="5-regression.html#fig:numxplot1">5.2</a>. The <code>method = "lm"</code> argument sets the line to be a “<code>l</code>inear <code>m</code>odel.” The <code>se = FALSE</code> argument suppresses <em>standard error</em> uncertainty bars. (We’ll define the concept of <em>standard error</em> later in Subsection <a href="7-sampling.html#sampling-definitions">7.3.2</a>.)</p>
<div class="sourceCode" id="cb149"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb149-1"><a href="5-regression.html#cb149-1" aria-hidden="true" tabindex="-1"></a><span class="fu">ggplot</span>(LionNoses, <span class="fu">aes</span>(<span class="at">x =</span> age, <span class="at">y =</span> proportion.black)) <span class="sc">+</span></span>
<span id="cb149-2"><a href="5-regression.html#cb149-2" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_point</span>() <span class="sc">+</span></span>
<span id="cb149-3"><a href="5-regression.html#cb149-3" aria-hidden="true" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">x =</span> <span class="st">"Age (years)"</span>, </span>
<span id="cb149-4"><a href="5-regression.html#cb149-4" aria-hidden="true" tabindex="-1"></a> <span class="at">y =</span> <span class="st">"Proportion of black nose"</span>,</span>
<span id="cb149-5"><a href="5-regression.html#cb149-5" aria-hidden="true" tabindex="-1"></a> <span class="at">title =</span> <span class="st">"Scatterplot of relationship of relative coloration and age of male lions."</span>) <span class="sc">+</span> </span>
<span id="cb149-6"><a href="5-regression.html#cb149-6" aria-hidden="true" tabindex="-1"></a> <span class="fu">geom_smooth</span>(<span class="at">method =</span> <span class="st">"lm"</span>, <span class="at">se =</span> <span class="cn">FALSE</span>)</span></code></pre></div>
<div class="figure" style="text-align: center"><span style="display:block;" id="fig:numxplot3"></span>
<img src="ModernDive_files/figure-html/numxplot3-1.png" alt="Regression line." width="\textwidth" />
<p class="caption">
FIGURE 5.3: Regression line.
</p>
</div>
<p>The line in the resulting Figure <a href="5-regression.html#fig:numxplot3">5.3</a> is called a “regression line.” The regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable <code>proportion.black</code> and the explanatory variable <code>age</code>. The positive slope of the blue line is consistent with our earlier observed correlation coefficient of 0.79 suggesting that there is a positive relationship between these two variables: as lions have higher ages, they also have noses with a higher proportion of black. We’ll see later, however, that while the correlation coefficient and the slope of a regression line always have the same sign (positive or negative), they typically do not have the same value.</p>
<p>Furthermore, a regression line is “best-fitting” in that it minimizes some mathematical criteria. We present these mathematical criteria in Subsection <a href="5-regression.html#leastsquares">5.3.2</a>, but we suggest you read this subsection only after first reading the rest of this section on regression with one numerical explanatory variable.</p>
<div class="learncheck">
<p>
<strong><em>Learning check</em></strong>
</p>
</div>
<p><strong>(LC5.1)</strong> Conduct a new exploratory data analysis with the <code>ProgesteroneExercise</code> data set in the <code>abd</code> package. Remember, this involves three things:</p>
<ol style="list-style-type: lower-alpha">
<li>Looking at the raw data values.</li>
<li>Computing summary statistics.</li>
<li>Creating data visualizations.</li>
</ol>
<p>The outcome variable <span class="math inline">\(y\)</span> is <code>ventilation</code> rate and the explanatory variable <span class="math inline">\(x\)</span> is <code>progesterone</code> levels. What can you say about the relationship between progesterone levels and ventilation rate based on this exploration?</p>
<div class="learncheck">
</div>
</div>
<div id="model1table" class="section level3" number="5.1.2">
<h3><span class="header-section-number">5.1.2</span> Simple linear regression</h3>
<p>You may recall from secondary/high school algebra that the equation of a line is <span class="math inline">\(y = a + b\cdot x\)</span>. (Note that the <span class="math inline">\(\cdot\)</span> symbol is equivalent to the <span class="math inline">\(\times\)</span> “multiply by” mathematical symbol. We’ll use the <span class="math inline">\(\cdot\)</span> symbol in the rest of this book as it is more succinct.) It is defined by two coefficients <span class="math inline">\(a\)</span> and <span class="math inline">\(b\)</span>. The intercept coefficient <span class="math inline">\(a\)</span> is the value of <span class="math inline">\(y\)</span> when <span class="math inline">\(x = 0\)</span>. The slope coefficient <span class="math inline">\(b\)</span> for <span class="math inline">\(x\)</span> is the increase in <span class="math inline">\(y\)</span> for every increase of one in <span class="math inline">\(x\)</span>. This is also called the “rise over run.”</p>
<p>However, when defining a regression line like the regression line in Figure <a href="5-regression.html#fig:numxplot3">5.3</a>, we use slightly different notation: the equation of the regression line is <span class="math inline">\(\widehat{y} = b_0 + b_1 \cdot x\)</span> . The intercept coefficient is <span class="math inline">\(b_0\)</span>, so <span class="math inline">\(b_0\)</span> is the value of <span class="math inline">\(\widehat{y}\)</span> when <span class="math inline">\(x = 0\)</span>. The slope coefficient for <span class="math inline">\(x\)</span> is <span class="math inline">\(b_1\)</span>, i.e., the increase in <span class="math inline">\(\widehat{y}\)</span> for every increase of one in <span class="math inline">\(x\)</span>. Why do we put a “hat” on top of the <span class="math inline">\(y\)</span>? It’s a form of notation commonly used in regression to indicate that we have a “fitted value,” or the value of <span class="math inline">\(y\)</span> on the regression line for a given <span class="math inline">\(x\)</span> value. We’ll discuss this more in the upcoming Subsection <a href="5-regression.html#model1points">5.1.3</a>.</p>
<p>We know that the regression line in Figure <a href="5-regression.html#fig:numxplot3">5.3</a> has a positive slope <span class="math inline">\(b_1\)</span> corresponding to our explanatory <span class="math inline">\(x\)</span> variable <code>age</code>. Why? Because as lions tend to have higher <code>age</code>s, so also do they tend to have higher <code>proportion.black</code> noses. However, what is the numerical value of the slope <span class="math inline">\(b_1\)</span>? What about the intercept <span class="math inline">\(b_0\)</span>? Let’s not compute these two values by hand, but rather let’s use a computer!</p>
<p>We can obtain the values of the intercept <span class="math inline">\(b_0\)</span> and the slope for <code>age</code> <span class="math inline">\(b_1\)</span> by outputting a <em>linear regression table</em>. This is done in two steps:</p>
<ol style="list-style-type: decimal">
<li>We first “fit” the linear regression model using the <code>lm()</code> function and save it in <code>lion_model</code>.</li>
<li>We get the regression table by applying the <code>get_regression_table()</code> function from the <code>moderndive</code> package to <code>lion_model</code>.</li>
</ol>
<div class="sourceCode" id="cb150"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb150-1"><a href="5-regression.html#cb150-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Fit regression model:</span></span>
<span id="cb150-2"><a href="5-regression.html#cb150-2" aria-hidden="true" tabindex="-1"></a>lion_model <span class="ot"><-</span> <span class="fu">lm</span>(proportion.black <span class="sc">~</span> age, <span class="at">data =</span> LionNoses)</span>
<span id="cb150-3"><a href="5-regression.html#cb150-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Get regression table:</span></span>
<span id="cb150-4"><a href="5-regression.html#cb150-4" aria-hidden="true" tabindex="-1"></a><span class="fu">get_regression_table</span>(lion_model)</span></code></pre></div>
<table class="table" style="font-size: 16px; margin-left: auto; margin-right: auto;">
<caption style="font-size: initial !important;">
<span id="tab:regtable">TABLE 5.2: </span>Linear regression table
</caption>
<thead>
<tr>
<th style="text-align:left;">
term
</th>
<th style="text-align:right;">
estimate
</th>
<th style="text-align:right;">
std_error
</th>
<th style="text-align:right;">
statistic
</th>
<th style="text-align:right;">
p_value
</th>
<th style="text-align:right;">
lower_ci
</th>
<th style="text-align:right;">
upper_ci
</th>
</tr>
</thead>
<tbody>
<tr>
<td style="text-align:left;">
intercept
</td>
<td style="text-align:right;">
0.070
</td>
<td style="text-align:right;">
0.042
</td>
<td style="text-align:right;">
1.66
</td>
<td style="text-align:right;">
0.107
</td>
<td style="text-align:right;">
-0.016
</td>
<td style="text-align:right;">
0.155
</td>
</tr>