Skip to content

Commit

Permalink
Merge pull request #219 from natashawatkins/patch-3
Browse files Browse the repository at this point in the history
Correct typos in back_prop.md
  • Loading branch information
jstac authored Mar 27, 2022
2 parents 2ca3637 + 6d63f03 commit 309273e
Showing 1 changed file with 10 additions and 11 deletions.
21 changes: 10 additions & 11 deletions lectures/back_prop.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ We'll describe the following concepts that are brick and mortar for neural netwo
* an activation function
* a network of neurons
* A neural network as a composition of functions
* back-propogation and its relationship to the chain rule of differential calculus
* back-propagation and its relationship to the chain rule of differential calculus


## A Deep (but not Wide) Artificial Neural Network
Expand Down Expand Up @@ -172,22 +172,22 @@ $$ (eq:sgd)
where $\frac{d {\mathcal L}}{dx_{N+1}}=-\left(x_{N+1}-y\right)$ and $\alpha > 0 $ is a step size.
(See [this](https://en.wikipedia.org/wiki/Gradient_descent#Description) and [this](https://en.wikipedia.org/wiki/Newton%27s_method)) to gather insights about how stochastic gradient descent
(See [this](https://en.wikipedia.org/wiki/Gradient_descent#Description) and [this](https://en.wikipedia.org/wiki/Newton%27s_method) to gather insights about how stochastic gradient descent
relates to Newton's method.)
To implement one step of this parameter update rule, we want the vector of derivatives $\frac{dx_{N+1}}{dp_k}$.
In the neural network literature, this step is accomplished by what is known as **back propogation**
In the neural network literature, this step is accomplished by what is known as **back propagation**.
## Back Propogation and the Chain Rule
## Back Propagation and the Chain Rule
Thanks to properties of
* the chain and product rules for differentiation from differential calculus, and
* lower triangular matrices
back propogation can actually be accomplished in one step by
back propagation can actually be accomplished in one step by
* inverting a lower triangular matrix, and
Expand Down Expand Up @@ -284,7 +284,7 @@ We can then solve the above problem by applying our update for $p$ multiple time
Choosing a training set amounts to a choice of measure $\mu$ in the above formulation of our function approximation problem as a minimization problem.
In this spirit, we shall use a uniform grid of, say, 50 or 200 or $\ldots$ points.
In this spirit, we shall use a uniform grid of, say, 50 or 200 points.
There are many possible approaches to the minimization problem posed above:
Expand All @@ -294,7 +294,7 @@ There are many possible approaches to the minimization problem posed above:
* something in-between (so-called "mini-batch gradient descent")
The update rule {eq}`eq:sgd` described above amounts to a stochastic gradient descent algorithm
The update rule {eq}`eq:sgd` described above amounts to a stochastic gradient descent algorithm.
```{code-cell} ipython3
from IPython.display import Image
Expand Down Expand Up @@ -356,7 +356,6 @@ def loss(params, x, y):
preds = xs[-1]
return 1 / 2 * (y - preds) ** 2
```
```{code-cell} ipython3
Expand Down Expand Up @@ -512,8 +511,8 @@ Image(fig.to_image(format="png"))
It is fun to think about how deepening the neural net for the above example affects the quality of approximation
* if the network is too deep, you'll run into the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html)
* other parameters such as the step size and the number of epochs can be as important or more important than the number of layers in the situation considered in this lecture.
* If the network is too deep, you'll run into the [vanishing gradient problem](http://neuralnetworksanddeeplearning.com/chap5.html)
* Other parameters such as the step size and the number of epochs can be as important or more important than the number of layers in the situation considered in this lecture.
* Indeed, since $f$ is a linear function of $x$, a one-layer network with the identity map as an activation would probably work best.
Expand Down Expand Up @@ -598,4 +597,4 @@ print(xla_bridge.get_backend().platform)
**Cloud Environment:** This lecture site is built in a server environment that doesn't have access to a `gpu`
If you run this lecture locally this lets you know where your code is being executed, either
via the `cpu` or the `gpu`
```
```

0 comments on commit 309273e

Please sign in to comment.