You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thank you for a legendary tutorial on the basics of neural nets and gradient descent. I understood the derivation of the gradient (3 applications of the chain rule! whoa!!) but why did you transpose the matrix ∂z3/∂W? At about 4:45 in part 4 you had to multiply the back propagating error (delta 3) which is a 3x1 matrix by a3(a 3x3 matrix). You commuted delta 3 and a3 . But matrix multiplication is not commutative.
And you transposed a3 to boot!
These two operations seem arbitrary. Why are they valid?
Why did you simply not take the dot product of delta3 and a3
(this way you would get a 3 x 1 matrix
S1
S2
S3
a1-1....a1-2....a1-3
a2-1....a2-2....a2-3
a3-1....a3-2....a3-3
S1 * (a1-1..+..a1-2..+..a1-3)
S2 * (a2-1..+..a2-2..+..a3-3)
S3* (a3-1..+..a3-2..+..a3-3)
And the result was a 3 x 1 matrix - I imagine this is the new gradient?
The text was updated successfully, but these errors were encountered:
thank you for a legendary tutorial on the basics of neural nets and gradient descent. I understood the derivation of the gradient (3 applications of the chain rule! whoa!!) but why did you transpose the matrix ∂z3/∂W? At about 4:45 in part 4 you had to multiply the back propagating error (delta 3) which is a 3x1 matrix by a3(a 3x3 matrix). You commuted delta 3 and a3 . But matrix multiplication is not commutative.
And you transposed a3 to boot!
These two operations seem arbitrary. Why are they valid?
Why did you simply not take the dot product of delta3 and a3
(this way you would get a 3 x 1 matrix
The text was updated successfully, but these errors were encountered: