-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathhomework.qmd
172 lines (137 loc) · 3.1 KB
/
homework.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
title: "Homework 03 - Functions, Profiling, and Rcpp"
format:
html:
embed-resources: true
---
# Due date
Thursday, October 31
# Background
For this assignment, you'll be quested with speeding up some code usint g what
you have learned about vectorization and Rcpp.
## Part 1: Vectorizing code
The following functions can be written to be more efficient without using
parallel computing:
1. This function generates a `n x k` dataset with all its entries distributed
Poisson with mean `lambda`.
```{r}
#| label: p1-fun1
#| eval: false
fun1 <- function(n = 100, k = 4, lambda = 4) {
x <- NULL
for (i in 1:n)
x <- rbind(x, rpois(k, lambda))
return(x)
}
fun1alt <- function(n = 100, k = 4, lambda = 4) {
# YOUR CODE HERE
}
# Benchmarking
bench::mark(
fun1(),
fun1alt(), relative = TRUE, check = FALSE
)
```
2. Like before, speed up the following functions (it is OK to use StackOverflow)
```{r}
#| label: p1-fun2
#| eval: false
# Total row sums
fun1 <- function(mat) {
n <- nrow(mat)
ans <- double(n)
for (i in 1:n) {
ans[i] <- sum(mat[i, ])
}
ans
}
fun1alt <- function(mat) {
# YOUR CODE HERE
}
# Cumulative sum by row
fun2 <- function(mat) {
n <- nrow(mat)
k <- ncol(mat)
ans <- mat
for (i in 1:n) {
for (j in 2:k) {
ans[i,j] <- mat[i, j] + ans[i, j - 1]
}
}
ans
}
fun2alt <- function(mat) {
# YOUR CODE HERE
}
# Use the data with this code
set.seed(2315)
dat <- matrix(rnorm(200 * 100), nrow = 200)
# Test for the first
bench::mark(
fun1(dat),
fun1alt(dat), relative = TRUE
)
# Test for the second
bench::mark(
fun2(dat),
fun2alt(dat), relative = TRUE
)
```
3. Find the column max (hint: Check out the function `max.col()`).
```{r}
#| label: p1-fun3
#| eval: false
# Data Generating Process (10 x 10,000 matrix)
set.seed(1234)
x <- matrix(rnorm(1e4), nrow=10)
# Find each column's max value
fun2 <- function(x) {
apply(x, 2, max)
}
fun2alt <- function(x) {
# YOUR CODE HERE
}
# Benchmarking
bench::mark(
fun2(),
fun2alt(), relative = TRUE
)
```
## Part 2: C++
As we saw in the Rcpp week, vectorization may not be the best solution. For this
part, you must write a function using Rcpp that implements the [scale-free algorithm](https://gvegayon.github.io/networks-udd2024/02-random-graphs.html#scale-free-networks). The following code implements this in R:
```{r}
## Model parameters
n <- 500
m <- 2
## Generating the graph
set.seed(3312)
g <- matrix(0, nrow = n, ncol = n)
g[1:m, 1:m] <- 1
diag(g) <- 0
## Adding nodes
for (i in (m + 1):n) {
# Selecting the nodes to connect to
ids <- sample(
x = 1:(i-1), # Up to i-1
size = m, # m nodes
replace = FALSE, # No replacement
# Probability proportional to the degree
prob = colSums(g[, 1:(i-1), drop = FALSE])
)
# Adding the edges
g[i, ids] <- 1
g[ids, i] <- 1
}
## Visualizing the degree distribution
library(ggplot2)
data.frame(degree = colSums(g)) |>
ggplot(aes(degree)) +
geom_histogram() +
scale_x_log10() +
labs(
x = "Degree\n(log10 scale)",
y = "Count"
)
```
Rewrite the function that generates the scale-free network using Rcpp.