Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

basic linear algebra question #28

Open
bwanaaa opened this issue May 11, 2019 · 0 comments
Open

basic linear algebra question #28

bwanaaa opened this issue May 11, 2019 · 0 comments

Comments

@bwanaaa
Copy link

bwanaaa commented May 11, 2019

thank you for a legendary tutorial on the basics of neural nets and gradient descent. I understood the derivation of the gradient (3 applications of the chain rule! whoa!!) but why did you transpose the matrix ∂z3/∂W? At about 4:45 in part 4 you had to multiply the back propagating error (delta 3) which is a 3x1 matrix by a3(a 3x3 matrix). You commuted delta 3 and a3 . But matrix multiplication is not commutative.
And you transposed a3 to boot!

These two operations seem arbitrary. Why are they valid?
Why did you simply not take the dot product of delta3 and a3
(this way you would get a 3 x 1 matrix

  • S1
  • S2
  • S3
  • a1-1....a1-2....a1-3
  • a2-1....a2-2....a2-3
  • a3-1....a3-2....a3-3
  • S1 * (a1-1..+..a1-2..+..a1-3)
  • S2 * (a2-1..+..a2-2..+..a3-3)
  • S3* (a3-1..+..a3-2..+..a3-3)
And the result was a 3 x 1 matrix - I imagine this is the new gradient?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant