-
Notifications
You must be signed in to change notification settings - Fork 22
/
session1-get-started.qmd
207 lines (146 loc) · 7.35 KB
/
session1-get-started.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
---
title: "1: Getting started with Julia"
author: "Claudia Solis-Lemus and Douglas Bates"
subtitle: "ISCB 2022 Madison"
---
# Why Julia?
## From the creators of Julia
"We want a language that is
- open source
- with the speed of C
- obvious, familiar mathematical notation like Matlab
- as usable for general programming as Python
- as easy for statistics as R
- as natural for string processing as Perl
- as powerful for linear algebra as Matlab
- as good at gluing programs together as the shell
- dirt simple to learn, yet keeps the most serious hackers happy"
## Reasons to try Julia
- Comparison with other languages: [Julia touts its speed edge over Python and R](https://www.zdnet.com/article/programming-languages-julia-touts-its-speed-edge-over-python-and-r/)
- Used for large-scale projects like [CliMA 0.1](https://clima.caltech.edu/2020/06/08/clima-0-1-a-first-milestone-in-the-next-generation-of-climate-models/): a first milestone in the next generation of climate models
- [ClimateMachine.jl](https://github.com/CliMA/ClimateMachine.jl)
- Julia adoption accelerated at a rapid pace in 2020:
<div style="text-align:center"><img src="pics/julia-adoption.png" width="450"/></div>
## The five stages of programming
1. Use the REPL as a sophisticated calculator
2. Realize that you are repeating many operations, so you decide to write some functions
3. To organize all your functions, you begin scripting
4. You want to share your code with others and thus, you want to write a package
5. Your package is actually used by others and thus, it should be optimized and have good performance
Julia offers many advantages to data science programmers such as avoid the two-language problem and existing tools that allows programmers to write efficient code without having to write everything from scratch!
# Workshop topics
In this workshop, we will focus on three main topics:
1. Data tools: `Arrow.jl`, `Tables.jl`, `DataFrames.jl`
2. Model fitting: `MixedModels.jl`
3. Package system and important packages for Data Science:
a. interoperability with R and python ([RCall.jl](https://github.com/JuliaInterop/RCall.jl) and [PyCall.jl](https://github.com/JuliaPy/PyCall.jl)),
b. literate programming ([quarto.org](https://quarto.org/) and [Pluto.jl](https://github.com/fonsp/Pluto.jl)),
c. plotting ([Makie ecosystem](https://makie.juliaplots.org/stable/))
In addition, we will have an hands-on exercise in which participants will bring a dataset of their choice along with an existing script in another language like R or python with a specific data analysis. Assisted by the presenters, they will work during the tutorial to convert the code to Julia language.
# The Julia REPL
Let's start by looking at the REPL and try:
- `;` to open shell mode
- `$` to open R mode (after `using RCall`)
- `]` to open package mode: try `status`
- `?` to open help mode
- `\beta+TAB` for math symbols
- `<backspace>` return to Julia mode
# Setting up your project folder
## How do I write code?
![](pics/julia-editors.png)
## Installing dependencies in a project environment
There are two alternatives:
1. Using an existing project with dependencies already in `Project.toml` (this will be the case when you are collaborating with someone that already set up the project dependencies)
2. Set up the dependencies for your project on your own (this will be the case if your project is new)
For this workshop, participants will use [the workshop GitHub repository](https://github.com/crsl4/julia-workshop) as the existing project that already has all the dependencies. However, we also show the steps below to create a new project from scratch.
### 1. Working on an existing project environment
Git clone the repository:
```shell
git clone https://github.com/crsl4/julia-workshop.git
```
Open julia and activate the package with:
```julia
julia --project
```
Alternatively, you can open julia normally, and type `] activate .`
Then, instantiate the package (install dependencies) in the package mode `]`:
```julia
(julia-workshop) pkg> instantiate
(julia-workshop) pkg> update
```
Now, you should be able to follow along the workshop commands. Trouble-shooting might be needed when we reach the interoperability with Python and R as certain libraries or packages might need to be installed too.
### 2. Creating a new project environment
Create a folder in the terminal:
```shell
mkdir myProject
cd myProject
```
Open Julia inside your folder, and activate your environment with:
```julia
(@v1.8) pkg> activate .
```
Install the packages that we need. For example, the packages needed for today's workshop are:
```julia
julia> ENV["PYTHON"] = ""
(myproject) pkg> add PyCall
(myproject) pkg> add IJulia
(myproject) pkg> build IJulia
(myproject) pkg> add MixedModels
(myproject) pkg> add RCall
(myproject) pkg> add Arrow
(myproject) pkg> add DataFrames
(myproject) pkg> add Tables
(myproject) pkg> add RangeTrees
```
**Note:** This will create a whole new conda environment, so it will take up space in memory.
Two files are noteworthy:
- `Project.toml`: Defines project
- `Manifest.toml`: Contains exact list of project dependencies
```julia
shell> head Project.toml
[deps]
Arrow = "69666777-d1a9-59fb-9406-91d4454c9d45"
DataFrames = "a93c6f00-e57d-5684-b7b6-d8193f3e46c0"
IJulia = "7073ff75-c697-5162-941a-fcdaad2a7d2a"
MixedModels = "ff71e718-51f3-5ec2-a782-8ffcbfa3c316"
PyCall = "438e738f-606a-5dbb-bf0a-cddfbfd45ab0"
RCall = "6f49c342-dc21-5d91-9882-a32aef131414"
Tables = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
shell> head Manifest.toml
# This file is machine-generated - editing it directly is not advised
julia_version = "1.8.0-beta3"
manifest_format = "2.0"
project_hash = "01baf737705b090869a607b779c699f83bbeb154"
[[deps.ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"
version = "1.1.1"
```
Look at your `Project.toml` and `Manifest.toml` files after installation. They have all the necessary information about your session.
The packages have a [uuid](https://en.wikipedia.org/wiki/Universally_unique_identifier) string which is the universally unique identifier.
More on the `Project.toml` and `Manifest.toml` files [here](https://julialang.github.io/Pkg.jl/v1/toml-files/#Project-and-Manifest-1).
## Easily share with collaborators
Share your project to colleagues. Send your entire project folder to your colleague, and all they need to do is:
```julia
julia> cd("path/to/project")
pkg> activate .
pkg> instantiate
```
All required packages and dependencies will be installed. Scripts that run in your computer will also run in their computer.
# Editing julia code
There are many ways to work on julia code depending on the editor. We will focus today on VSCode.
1. Download VSCode [here](https://code.visualstudio.com/download)
2. Install the Julia extension and the Quarto extension in VSCode
3. Open the project with:
```shell
cd julia-workshop
code .
```
You need to have the `code` command in your `PATH` which can be done within VSCode searching in the Command Palette "shell command".
Alternatively, you can open a specific `qmd` file in your editor of choice:
```shell
cd julia-workshop
open session2a-tables-and-arrow.qmd
```
If you open a given file, make sure to activate the project `] activate .` or you will be working on the global environment.
Note that jupyter notebooks `ipynb` are produced when the `qmd` files when rendered into `html`.
In addition to `qmd` files, you can work with standard `.jl` julia files.