forked from swcarpentry/r-novice-gapminder
-
Notifications
You must be signed in to change notification settings - Fork 0
/
11-writing-data.html
134 lines (131 loc) · 8.82 KB
/
11-writing-data.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">Writing data</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>To be able to write out plots and data from R</li>
</ul>
</div>
</section>
<h2 id="saving-plots">Saving plots</h2>
<p>You have already seen how to save the most recent plot you create in <code>ggplot2</code>, using the command <code>ggsave</code>. As a refresher:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">ggsave</span>(<span class="st">"My_most_recent_plot.pdf"</span>)</code></pre></div>
<p>You can save a plot from within RStudio using the ‘Export’ button in the ‘Plot’ window. This will give you the option of saving as a .pdf or as .png, .jpg or other image formats.</p>
<p>Sometimes you will want to save plots without creating them in the ‘Plot’ window first. Perhaps you want to make a pdf document with multiple pages: each one a different plot, for example. Or perhaps you’re looping through multiple subsets of a file, plotting data from each subset, and you want to save each plot, but obviously can’t stop the loop to click ‘Export’ for each one.</p>
<p>In this case you can use a more flexible approach. The function <code>pdf</code> creates a new pdf device. You can control the size and resolution using the arguments to this function.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">pdf</span>(<span class="st">"Life_Exp_vs_time.pdf"</span>, <span class="dt">width=</span><span class="dv">12</span>, <span class="dt">height=</span><span class="dv">4</span>)
<span class="kw">ggplot</span>(<span class="dt">data=</span>gapminder, <span class="kw">aes</span>(<span class="dt">x=</span>year, <span class="dt">y=</span>lifeExp, <span class="dt">colour=</span>country)) +
<span class="st"> </span><span class="kw">geom_line</span>()
<span class="co"># You then have to make sure to turn off the pdf device!</span>
<span class="kw">dev.off</span>()</code></pre></div>
<p>Open up this document and have a look.</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-1"><span class="glyphicon glyphicon-pencil"></span>Challenge 1</h2>
</div>
<div class="panel-body">
<p>Rewrite your ‘pdf’ command to print a second page in the pdf, showing a facet plot (hint: use <code>facet_grid</code>) of the same data with one panel per continent.</p>
</div>
</section>
<p>The commands <code>jpeg</code>, <code>png</code> etc. are used similarly to produce documents in different formats.</p>
<h2 id="writing-data">Writing data</h2>
<p>At some point, you’ll also want to write out data from R.</p>
<p>We can use the <code>write.table</code> function for this, which is very similar to <code>read.table</code> from before.</p>
<p>Let’s create a data-cleaning script, for this analysis, we only want to focus on the gapminder data for Australia:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">aust_subset <-<span class="st"> </span>gapminder[gapminder$country ==<span class="st"> "Australia"</span>,]
<span class="kw">write.table</span>(aust_subset,
<span class="dt">file=</span><span class="st">"cleaned-data/gapminder-aus.csv"</span>,
<span class="dt">sep=</span><span class="st">","</span>
)</code></pre></div>
<p>Let’s switch back to the shell to take a look at the data to make sure it looks OK:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">head cleaned-data/gapminder-aus.csv</code></pre></div>
<pre class="output"><code>"country","year","pop","continent","lifeExp","gdpPercap"
"61","Australia",1952,8691212,"Oceania",69.12,10039.59564
"62","Australia",1957,9712569,"Oceania",70.33,10949.64959
"63","Australia",1962,10794968,"Oceania",70.93,12217.22686
"64","Australia",1967,11872264,"Oceania",71.1,14526.12465
"65","Australia",1972,13177000,"Oceania",71.93,16788.62948
"66","Australia",1977,14074100,"Oceania",73.49,18334.19751
"67","Australia",1982,15184200,"Oceania",74.74,19477.00928
"68","Australia",1987,16257249,"Oceania",76.32,21888.88903
"69","Australia",1992,17481977,"Oceania",77.56,23424.76683
</code></pre>
<p>Hmm, that’s not quite what we wanted. Where did all these quotation marks come from? Also the row numbers are meaningless.</p>
<p>Let’s look at the help file to work out how to change this behaviour.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">?write.table</code></pre></div>
<p>By default R will wrap character vectors with quotation marks when writing out to file. It will also write out the row and column names.</p>
<p>Let’s fix this:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">write.table</span>(
gapminder[gapminder$country ==<span class="st"> "Australia"</span>,],
<span class="dt">file=</span><span class="st">"cleaned-data/gapminder-aus.csv"</span>,
<span class="dt">sep=</span><span class="st">","</span>, <span class="dt">quote=</span><span class="ot">FALSE</span>, <span class="dt">row.names=</span><span class="ot">FALSE</span>
)</code></pre></div>
<p>Now lets look at the data again using our shell skills:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">head cleaned-data/gapminder-aus.csv</code></pre></div>
<pre class="output"><code>country,year,pop,continent,lifeExp,gdpPercap
Australia,1952,8691212,Oceania,69.12,10039.59564
Australia,1957,9712569,Oceania,70.33,10949.64959
Australia,1962,10794968,Oceania,70.93,12217.22686
Australia,1967,11872264,Oceania,71.1,14526.12465
Australia,1972,13177000,Oceania,71.93,16788.62948
Australia,1977,14074100,Oceania,73.49,18334.19751
Australia,1982,15184200,Oceania,74.74,19477.00928
Australia,1987,16257249,Oceania,76.32,21888.88903
Australia,1992,17481977,Oceania,77.56,23424.76683
</code></pre>
<p>That looks better!</p>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-2"><span class="glyphicon glyphicon-pencil"></span>Challenge 2</h2>
</div>
<div class="panel-body">
<p>Write a data-cleaning script file that subsets the gapminder data to include only data points collected since 1990.</p>
<p>Use this script to write out the new subset to a file in the <code>cleaned-data/</code> directory.</p>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:admin@software-carpentry.org">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>