basic linear algebra question #28

bwanaaa · 2019-05-11T19:40:55Z

thank you for a legendary tutorial on the basics of neural nets and gradient descent. I understood the derivation of the gradient (3 applications of the chain rule! whoa!!) but why did you transpose the matrix ∂z3/∂W? At about 4:45 in part 4 you had to multiply the back propagating error (delta 3) which is a 3x1 matrix by a3(a 3x3 matrix). You commuted delta 3 and a3 . But matrix multiplication is not commutative.
And you transposed a3 to boot!

These two operations seem arbitrary. Why are they valid?
Why did you simply not take the dot product of delta3 and a3
(this way you would get a 3 x 1 matrix

S1 S2 S3	a1-1....a1-2....a1-3 a2-1....a2-2....a2-3 a3-1....a3-2....a3-3	S1 * (a1-1..+..a1-2..+..a1-3) S2 * (a2-1..+..a2-2..+..a3-3) S3* (a3-1..+..a3-2..+..a3-3)

And the result was a 3 x 1 matrix - I imagine this is the new gradient?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

basic linear algebra question #28

basic linear algebra question #28

bwanaaa commented May 11, 2019 •

edited

Loading

basic linear algebra question #28

basic linear algebra question #28

Comments

bwanaaa commented May 11, 2019 • edited Loading

bwanaaa commented May 11, 2019 •

edited

Loading