diff --git a/lectures/applications/heterogeneity.md b/lectures/applications/heterogeneity.md
index 84bb4cfa..28e91f80 100644
--- a/lectures/applications/heterogeneity.md
+++ b/lectures/applications/heterogeneity.md
@@ -101,12 +101,12 @@ When treatment is randomly assigned, we can estimate average treatment
 effects because
 
 $$
-\begin{align*}
+\begin{aligned}
 E[y_i(1) - y_i(0) ] = & E[y_i(1)] - E[y_i(0)] \\
 & \text{random assignment } \\
 = & E[y_i(1) | d_i = 1] - E[y_i(0) | d_i = 0] \\
 = & E[y_i | d_i = 1] - E[y_i | d_i = 0 ]
-\end{align*}
+\end{aligned}
 $$
 
 ### Average Treatment Effects
@@ -164,12 +164,12 @@ logic that lets us estimate unconditional average treatment effects
 also suggests that we can estimate conditional average treatment effects.
 
 $$
-\begin{align*}
+\begin{aligned}
 E[y_i(1) - y_i(0) |X_i=x] = & E[y_i(1)|X_i = x] - E[y_i(0)|X_i=x] \\
 & \text{random assignment } \\
 = & E[y_i(1) | d_i = 1, X_i=x] - E[y_i(0) | d_i = 0, X_i=x] \\
 = & E[y_i | d_i = 1, X_i = x] - E[y_i | d_i = 0, X_i=x ]
-\end{align*}
+\end{aligned}
 $$
 
 Conditional average treatment effects tell us whether there are
@@ -209,7 +209,6 @@ $S(x)$ approximates $s_0(x)$ is to look at the best linear
 projection of $s_0(x)$ on $S(x)$.
 
 $$
-\DeclareMathOperator*{\argmin}{arg\,min}
 \beta_0, \beta_1 = \argmin_{b_0,b_1} E[(s_0(x) -
 b_0 - b_1 (S(x)-E[S(x)]))^2]
 $$
diff --git a/lectures/applications/regression.md b/lectures/applications/regression.md
index 0545d969..e3d5f81e 100644
--- a/lectures/applications/regression.md
+++ b/lectures/applications/regression.md
@@ -123,7 +123,7 @@ only the livable square footage of the home.
 The linear regression model for this situation is
 
 $$
-\log(\text{price}) = \beta_0 + \beta_1 \text{sqft_living} + \epsilon
+\log(\text{price}) = \beta_0 + \beta_1 \text{sqft\_living} + \epsilon
 $$
 
 $\beta_0$ and $\beta_1$ are called parameters (also coefficients or
@@ -132,14 +132,14 @@ that best fit the data.
 
 $\epsilon$ is the error term. It would be unusual for the observed
 $\log(\text{price})$ to be an exact linear function of
-$\text{sqft_living}$. The error term captures the deviation of
-$\log(\text{price})$ from a linear function of $\text{sqft_living}$.
+$\text{sqft\_living}$. The error term captures the deviation of
+$\log(\text{price})$ from a linear function of $\text{sqft\_living}$.
 
 The linear regression algorithm will choose the parameters that minimize the
 *mean squared error* (MSE) function, which for our example is written.
 
 $$
-\frac{1}{N} \sum_{i=1}^N \left(\log(\text{price}_i) - (\beta_0 + \beta_1 \text{sqft_living}_i) \right)^2
+\frac{1}{N} \sum_{i=1}^N \left(\log(\text{price}_i) - (\beta_0 + \beta_1 \text{sqft\_living}_i) \right)^2
 $$
 
 The output of this algorithm is the straight line (hence linear) that passes as
@@ -218,7 +218,7 @@ Suppose that in addition to `sqft_living`, we also wanted to use the `bathrooms`
 In this case, the linear regression model is
 
 $$
-\log(\text{price}) = \beta_0 + \beta_1 \text{sqft_living} +
+\log(\text{price}) = \beta_0 + \beta_1 \text{sqft\_living} +
 \beta_2 \text{bathrooms} + \epsilon
 $$
 
@@ -227,7 +227,7 @@ We could keep adding one variable at a time, along with a new $\beta_{j}$ coeffi
 Let's write this equation in vector/matrix form as
 
 $$
-\underbrace{\begin{bmatrix} \log(\text{price}_1) \\ \log(\text{price}_2) \\ \vdots \\ \log(\text{price}_N)\end{bmatrix}}_Y = \underbrace{\begin{bmatrix} 1 & \text{sqft_living}_1 & \text{bathrooms}_1 \\ 1 & \text{sqft_living}_2 & \text{bathrooms}_2 \\ \vdots & \vdots & \vdots \\ 1 & \text{sqft_living}_N & \text{bathrooms}_N \end{bmatrix}}_{X} \underbrace{\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix}}_{\beta} + \epsilon
+\underbrace{\begin{bmatrix} \log(\text{price}_1) \\ \log(\text{price}_2) \\ \vdots \\ \log(\text{price}_N)\end{bmatrix}}_Y = \underbrace{\begin{bmatrix} 1 & \text{sqft\_living}_1 & \text{bathrooms}_1 \\ 1 & \text{sqft\_living}_2 & \text{bathrooms}_2 \\ \vdots & \vdots & \vdots \\ 1 & \text{sqft\_living}_N & \text{bathrooms}_N \end{bmatrix}}_{X} \underbrace{\begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \end{bmatrix}}_{\beta} + \epsilon
 $$
 
 Notice that we can add as many columns to $X$ as we'd like and the linear
diff --git a/lectures/problem_sets/problem_set_3.md b/lectures/problem_sets/problem_set_3.md
index baa5fb79..dfcc6cb2 100644
--- a/lectures/problem_sets/problem_set_3.md
+++ b/lectures/problem_sets/problem_set_3.md
@@ -197,10 +197,10 @@ face value $M$, yield to maturity $i$, and periods to maturity
 $N$ is
 
 $$
-\begin{align*}
+\begin{aligned}
     P &= \left(\sum_{n=1}^N \frac{C}{(i+1)^n}\right) + \frac{M}{(1+i)^N} \\
         &= C \left(\frac{1 - (1+i)^{-N}}{i} \right) + M(1+i)^{-N}
-\end{align*}
+\end{aligned}
 $$
 
 In the code cell below, we have defined variables for `i`, `M` and `C`.
diff --git a/lectures/python_fundamentals/functions.md b/lectures/python_fundamentals/functions.md
index c0c460c0..cdf28262 100644
--- a/lectures/python_fundamentals/functions.md
+++ b/lectures/python_fundamentals/functions.md
@@ -633,10 +633,10 @@ that can be interchanged.
 That is, the following are identical.
 
 $$
-\begin{eqnarray}
+\begin{aligned}
     f(K, L) &= z\, K^{\alpha} L^{1-\alpha}\\
     f(K_2, L_2) &= z\, K_2^{\alpha} L_2^{1-\alpha}
-\end{eqnarray}
+\end{aligned}
 $$
 
 The same concept applies to Python functions, where the arguments are just
diff --git a/lectures/scientific/applied_linalg.md b/lectures/scientific/applied_linalg.md
index 44e1e2fa..45e7cecc 100644
--- a/lectures/scientific/applied_linalg.md
+++ b/lectures/scientific/applied_linalg.md
@@ -343,11 +343,11 @@ $\begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix}$ then we can multiply both sides b
 to get
 
 $$
-\begin{align*}
+\begin{aligned}
 \begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix}^{-1}\begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} &= \begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix}^{-1}\begin{bmatrix} 3 \\ 4 \end{bmatrix} \\
 I \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} &= \begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix}^{-1} \begin{bmatrix} 3 \\ 4 \end{bmatrix} \\
  \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} &= \begin{bmatrix} 1 & 2 \\ 3 & 1 \end{bmatrix}^{-1} \begin{bmatrix} 3 \\ 4 \end{bmatrix}
-\end{align*}
+ \end{aligned}
 $$
 
 Computing the inverse requires that a matrix be square and satisfy some other conditions
diff --git a/lectures/scientific/numpy_arrays.md b/lectures/scientific/numpy_arrays.md
index a01addc6..072e5878 100644
--- a/lectures/scientific/numpy_arrays.md
+++ b/lectures/scientific/numpy_arrays.md
@@ -521,10 +521,10 @@ face value $M$, yield to maturity $i$, and periods to maturity
 $N$ is
 
 $$
-\begin{align*}
+\begin{aligned}
     P &= \left(\sum_{n=1}^N \frac{C}{(i+1)^n}\right) + \frac{M}{(1+i)^N} \\
       &= C \left(\frac{1 - (1+i)^{-N}}{i} \right) + M(1+i)^{-N}
-\end{align*}
+\end{aligned}
 $$
 
 In the code cell below, we have defined variables for `i`, `M` and `C`.