E.g. 1 Deriving PCA by minimizing MSE.
- , is the -th sample with m dimension. Assume for simplicity that has zero mean.
- , is the -th basis vector with m dimension.
- , is the low dimension representation of .
The optimization problem of PCA is
We can simplify the above problem by using and , as
Introducing the Lagrange multipliers and , the optimization problem is equivalent to
Therefore . Let it be , we get
Left multiply the equation by and use eq. (2) - and eq. (3) - , we get
from which we can see is the eigenvalue of and is the corresponding eigenvector.
Substitute eq. (4) into eq. (1),
therefore , , ..., should be the largest k eigenvalues.
or
In the above, we haven't used any differential technique, because we haven't defined the derivative of vector-by-matrix which could be a 3D tensor. However, in some cases such as (w.r.t. ), the differential technique still works (see this example).
E.g. 2, , where is a matrix and is a matrix.
In the above, we haven't used any differential technique, because we haven't defined the derivative of matrix-by-matrix which could be a 4D tensor. However, in some cases such as , the differential technique still works (see this example). Besides, there is another excellent example of : derivative of SVD - https://arxiv.org/pdf/1509.07838.pdf.