From 8e0059bfdd6284cc9389bf66bc7cfc9f14ce987a Mon Sep 17 00:00:00 2001 From: thomassargent30 Date: Wed, 16 Feb 2022 16:54:23 -0700 Subject: [PATCH] Tom's Feb 16 edits of two lectures --- lectures/likelihood_ratio_process.md | 4 +- lectures/multivariate_normal.md | 1175 +++++++++++++------------- 2 files changed, 609 insertions(+), 570 deletions(-) diff --git a/lectures/likelihood_ratio_process.md b/lectures/likelihood_ratio_process.md index 92cd40c26..b31555b48 100644 --- a/lectures/likelihood_ratio_process.md +++ b/lectures/likelihood_ratio_process.md @@ -232,7 +232,7 @@ Mathematical induction implies $E\left[L\left(w^{t}\right)\bigm|q=g\right]=1$ for all $t \geq 1$. -## Peculiar Property of Likelihood Ratio Process +## Peculiar Property How can $E\left[L\left(w^{t}\right)\bigm|q=g\right]=1$ possibly be true when most probability mass of the likelihood ratio process is piling up near $0$ as @@ -545,7 +545,7 @@ control tests during World War II. A Navy Captain who had been ordered to perform tests of this kind had doubts about it that he presented to Milton Friedman, as we describe in {doc}`this lecture `. -## Kullback–Leibler divergence +## Kullback–Leibler Divergence Now let’s consider a case in which neither $g$ nor $f$ generates the data. diff --git a/lectures/multivariate_normal.md b/lectures/multivariate_normal.md index 28f36e691..d8934dd9a 100644 --- a/lectures/multivariate_normal.md +++ b/lectures/multivariate_normal.md @@ -35,12 +35,12 @@ In this lecture, you will learn formulas for * marginal distributions for all subvectors of $x$ * conditional distributions for subvectors of $x$ conditional on other subvectors of $x$ -We will use the multivariate normal distribution to formulate some classic models: +We will use the multivariate normal distribution to formulate some useful models: -* a **factor analytic model** of an intelligence quotient, i.e., IQ -* a **factor analytic model** of two independent inherent abilities, mathematical and verbal. +* a factor analytic model of an intelligence quotient, i.e., IQ +* a factor analytic model of two independent inherent abilities, say, mathematical and verbal. * a more general factor analytic model -* PCA as an approximation to a factor analytic model +* Principal Components Analysis (PCA) as an approximation to a factor analytic model * time series generated by linear stochastic difference equations * optimal linear filtering theory @@ -56,9 +56,9 @@ For a multivariate normal distribution it is very convenient that - conditional distributions are characterized by multivariate linear regressions -We apply our Python class to some classic examples. +We apply our Python class to some examples. -We will use the following imports: +We use the following imports: ```{code-cell} ipython %matplotlib inline @@ -82,6 +82,8 @@ where $\mu=Ez$ is the mean of the random vector $z$ and $\Sigma=E\left(z-\mu\right)\left(z-\mu\right)^\prime$ is the covariance matrix of $z$. +The covariance matrix $\Sigma$ is symmetric and positive definite. + ```{code-cell} python3 @njit def f(z, μ, Σ): @@ -110,10 +112,14 @@ def f(z, μ, Σ): return (2 * np.pi) ** (-N/2) * temp1 * temp2 ``` -For some integer $k\in \{2,\dots, N-1\}$, partition +For some integer $k\in \{1,\dots, N-1\}$, partition $z$ as -$z=\left[\begin{array}{c} z_{1}\\ z_{2} \end{array}\right]$, where -$z_1$ is an $\left(N-k\right)\times1$ vector and $z_2$ + +$$ +z=\left[\begin{array}{c} z_{1}\\ z_{2} \end{array}\right], +$$ + + where $z_1$ is an $\left(N-k\right)\times1$ vector and $z_2$ is a $k\times1$ vector. Let @@ -161,7 +167,7 @@ $$ $$ is an $\left(N-k\right) \times k$ matrix of **population -regression coefficients** of $z_1 - \mu_1$ on $z_2 - \mu_2$. +regression coefficients** of the $(N -k) \times 1$ random vector $z_1 - \mu_1$ on the $k \times 1$ random vector $z_2 - \mu_2$. The following class constructs a multivariate normal distribution instance with two methods. @@ -251,7 +257,7 @@ trivariate example. We’ll compute population moments of some conditional distributions using our `MultivariateNormal` class. -Then for fun we’ll compute sample analogs of the associated population +For fun we’ll also compute sample analogs of the associated population regressions by generating simulations and then computing linear least squares regressions. @@ -267,14 +273,14 @@ $$ 0\\ 0 \end{array}\right],\quad\Sigma=\left[\begin{array}{cc} -1 & .2\\ -.2 & 1 +1 & .5\\ +.5 & 2 \end{array}\right] $$ ```{code-cell} python3 μ = np.array([0., 0.]) -Σ = np.array([[1., .2], [.2 ,1.]]) +Σ = np.array([[1., .5], [.5 ,2.]]) # construction of the multivariate normal instance multi_normal = MultivariateNormal(μ, Σ) @@ -288,7 +294,25 @@ multi_normal.partition(k) multi_normal.βs[0] ``` -Let’s compute the mean and variance of the distribution of $z_1$ + +To illustrate the idea that you _can regress anything on anything else_, let's first compute the mean and variance of the distribution of $z_2$ +conditional on $z_1=5$. + +After that we'll reverse what are on the left and right sides of the regression. + + + +```{code-cell} python3 +# compute the cond. dist. of z1 +ind = 1 +z1 = np.array([5.]) # given z1 + +μ2_hat, Σ2_hat = multi_normal.cond_dist(ind, z1) +print('μ2_hat, Σ2_hat = ', μ2_hat, Σ2_hat) +``` + + +Now let’s compute the mean and variance of the distribution of $z_1$ conditional on $z_2=5$. ```{code-cell} python3 @@ -357,8 +381,8 @@ compare it with $\hat{\mu}_1$ Thus, in each case, for our very large sample size, the sample analogues closely approximate their population counterparts. -These close approximations are foretold by a version of a Law of Large -Numbers. +A Law of Large +Numbers explains why sample analogues approximate population objects. ## Trivariate Example @@ -444,10 +468,10 @@ $$ \theta = \mu_{\theta} + \sigma_{\theta} w_{n+1}. $$ -We assume the noise in the test scores is IID and not correlated with +We assume that the noises $\{w_i\}_{i=1}^N$ in the test scores are IID and not correlated with IQ. -In particular, we assume $\{w_i\}_{i=1}^{n+1}$ are i.i.d. standard +We also assume that $\{w_i\}_{i=1}^{n+1}$ are i.i.d. standard normal: $$ @@ -461,7 +485,7 @@ w_{n+1} \end{array}\right]\sim N\left(0,I_{n+1}\right) $$ -The following system describes the random vector $X$ that +The following system describes the $(n+1) \times 1$ random vector $X$ that interests us: $$ @@ -531,7 +555,7 @@ Assume we have recorded $50$ test scores and we know that $\mu_{\theta}=100$, $\sigma_{\theta}=10$, and $\sigma_{y}=10$. -We can compute the mean vector and covariance matrix of $x$ easily +We can compute the mean vector and covariance matrix of $X$ easily with our `construct_moments_IQ` function as follows. ```{code-cell} python3 @@ -546,6 +570,8 @@ We can now use our `MultivariateNormal` class to construct an instance, then partition the mean vector and covariance matrix as we wish. +We want to regress IQ, the random variable $\theta$ (_what we don't know_), on the vector $y$ of test scores (_what we do know_). + We choose `k=n` so that $z_{1} = y$ and $z_{2} = \theta$. ```{code-cell} python3 @@ -572,11 +598,13 @@ y = x[:-1] # test scores θ ``` -The method `cond_dist` takes test scores as input and returns the +The method `cond_dist` takes test scores $y$ as input and returns the conditional normal distribution of the IQ $\theta$. -Note that now $\theta$ is what we denoted as $z_{2}$ in the -general case so we need to set `ind=1`. +In the following code, `ind` sets the variables on the right side of the regression. + +Given the way we have defined the vector $X$, we want to set `ind=1` in order to make $\theta$ the left side variable in the +population regression. ```{code-cell} python3 ind = 1 @@ -586,7 +614,7 @@ multi_normal_IQ.cond_dist(ind, y) The first number is the conditional mean $\hat{\mu}_{\theta}$ and the second is the conditional variance $\hat{\Sigma}_{\theta}$. -How do the additional test scores affect our inferences? +How do additional test scores affect our inferences? To shed light on this, we compute a sequence of conditional distributions of $\theta$ by varying the number of test scores in @@ -659,11 +687,11 @@ approach $\theta$. Thus, each $y_{i}$ adds information about $\theta$. -If we drove the number of tests $n \rightarrow + \infty$, the +If we were to drive the number of tests $n \rightarrow + \infty$, the conditional standard deviation $\hat{\sigma}_{\theta}$ would -converge to $0$ at the rate $\frac{1}{n^{.5}}$. +converge to $0$ at rate $\frac{1}{n^{.5}}$. -## Another representation +## Information as Surprise By using a different representation, let’s look at things from a different perspective. @@ -693,7 +721,9 @@ $$ \epsilon \sim N(0, I) . $$ -Let $G=C^{-1}$; $G$ is also lower triangular. +Let $G=C^{-1}$ + +$G$ is also lower triangular. We can compute $\epsilon$ from the formula @@ -776,13 +806,13 @@ np.max(np.abs(μθ_hat_arr - μθ_hat_arr_C)) < 1e-10 np.max(np.abs(Σθ_hat_arr - Σθ_hat_arr_C)) < 1e-10 ``` -## Magic of the Cholesky factorization +## Cholesky Factor Magic -Evidently, the Cholesky factorization is automatically computing the +Evidently, the Cholesky factorizations automatically computes the population **regression coefficients** and associated statistics that are produced by our `MultivariateNormal` class. -The Cholesky factorization is computing things **recursively**. +The Cholesky factorization computes these things **recursively**. Indeed, in formula {eq}`mnv_1`, @@ -792,7 +822,7 @@ Indeed, in formula {eq}`mnv_1`, - the coefficient $c_i$ is the simple population regression coefficient of $\theta - \mu_\theta$ on $\epsilon_i$ -## Math and Verbal Components of Intelligence +## Math and Verbal Intelligence We can alter the preceding example to be more realistic. @@ -1261,801 +1291,810 @@ y This example is an instance of what is known as a **Wold representation** in time series analysis. -## Classic Factor Analysis Model -The factor analysis model widely used in psychology and other fields can -be represented as +## Stochastic Difference Equation + +Consider the stochastic second-order linear difference equation $$ -Y = \Lambda f + U +y_{t} = \alpha_{0} + \alpha_{1} y_{y-1} + \alpha_{2} y_{t-2} + u_{t} $$ -where - -1. $Y$ is $n \times 1$ random vector, - $E U U^{\prime} = D$ is a diagonal matrix, -1. $\Lambda$ is $n \times k$ coefficient matrix, -1. $f$ is $k \times 1$ random vector, - $E f f^{\prime} = I$, -1. $U$ is $n \times 1$ random vector, and $U \perp f$. -1. It is presumed that $k$ is small relative to $n$; often - $k$ is only $1$ or $2$, as in our IQ examples. - -This implies that +where $u_{t} \sim N \left(0, \sigma_{u}^{2}\right)$ and $$ -\begin{aligned} -\Sigma_y = E Y Y^{\prime} = \Lambda \Lambda^{\prime} + D \\ -E Y f^{\prime} = \Lambda \\ -E f Y^{\prime} = \Lambda^{\prime} -\end{aligned} +\left[\begin{array}{c} +y_{-1}\\ +y_{0} +\end{array}\right]\sim N\left(\mu_{\tilde{y}},\Sigma_{\tilde{y}}\right) $$ -Thus, the covariance matrix $\Sigma_Y$ is the sum of a diagonal -matrix $D$ and a positive semi-definite matrix -$\Lambda \Lambda^{\prime}$ of rank $k$. +It can be written as a stacked system -This means that all covariances among the $n$ components of the -$Y$ vector are intermediated by their common dependencies on the -$k<$ factors. +$$ +\underset{\equiv A}{\underbrace{\left[\begin{array}{cccccccc} +1 & 0 & 0 & 0 & \cdots & 0 & 0 & 0\\ +-\alpha_{1} & 1 & 0 & 0 & \cdots & 0 & 0 & 0\\ +-\alpha_{2} & -\alpha_{1} & 1 & 0 & \cdots & 0 & 0 & 0\\ +0 & -\alpha_{2} & -\alpha_{1} & 1 & \cdots & 0 & 0 & 0\\ +\vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots\\ +0 & 0 & 0 & 0 & \cdots & -\alpha_{2} & -\alpha_{1} & 1 +\end{array}\right]}}\left[\begin{array}{c} +y_{1}\\ +y_{2}\\ +y_{3}\\ +y_{4}\\ +\vdots\\ +y_{T} +\end{array}\right]=\underset{\equiv b}{\underbrace{\left[\begin{array}{c} +\alpha_{0}+\alpha_{1}y_{0}+\alpha_{2}y_{-1}\\ +\alpha_{0}+\alpha_{2}y_{0}\\ +\alpha_{0}\\ +\alpha_{0}\\ +\vdots\\ +\alpha_{0} +\end{array}\right]}} +$$ -Form +We can compute $y$ by solving the system $$ -Z=\left(\begin{array}{c} -f\\ -Y -\end{array}\right) +y = A^{-1} \left(b + u\right) $$ -the covariance matrix of the expanded random vector $Z$ can be -computed as +We have $$ -\Sigma_{z} = EZZ^{\prime}=\left(\begin{array}{cc} -I & \Lambda^{\prime}\\ -\Lambda & \Lambda\Lambda^{\prime}+D -\end{array}\right) +\begin{aligned} +\mu_{y} = A^{-1} \mu_{b} \\ +\Sigma_{y} &= A^{-1} E \left[\left(b - \mu_{b} + u \right) \left(b - \mu_{b} + u \right)^{\prime}\right] \left(A^{-1}\right)^{\prime} \\ + &= A^{-1} \left(\Sigma_{b} + \Sigma_{u} \right) \left(A^{-1}\right)^{\prime} +\end{aligned} $$ -In the following, we first construct the mean vector and the covariance -matrix for the case where $N=10$ and $k=2$. +where -```{code-cell} python3 -N = 10 -k = 2 -``` +$$ +\mu_{b}=\left[\begin{array}{c} +\alpha_{0}+\alpha_{1}\mu_{y_{0}}+\alpha_{2}\mu_{y_{-1}}\\ +\alpha_{0}+\alpha_{2}\mu_{y_{0}}\\ +\alpha_{0}\\ +\vdots\\ +\alpha_{0} +\end{array}\right] +$$ -We set the coefficient matrix $\Lambda$ and the covariance matrix -of $U$ to be +$$ +\Sigma_{b}=\left[\begin{array}{cc} +C\Sigma_{\tilde{y}}C^{\prime} & \boldsymbol{0}_{N-2\times N-2}\\ +\boldsymbol{0}_{N-2\times2} & \boldsymbol{0}_{N-2\times N-2} +\end{array}\right],\quad C=\left[\begin{array}{cc} +\alpha_{2} & \alpha_{1}\\ +0 & \alpha_{2} +\end{array}\right] +$$ $$ -\Lambda=\left(\begin{array}{cc} -1 & 0\\ -\vdots & \vdots\\ -1 & 0\\ -0 & 1\\ -\vdots & \vdots\\ -0 & 1 -\end{array}\right),\quad D=\left(\begin{array}{cccc} +\Sigma_{u}=\left[\begin{array}{cccc} \sigma_{u}^{2} & 0 & \cdots & 0\\ 0 & \sigma_{u}^{2} & \cdots & 0\\ \vdots & \vdots & \vdots & \vdots\\ 0 & 0 & \cdots & \sigma_{u}^{2} -\end{array}\right) +\end{array}\right] $$ -where the first half of the first column of $\Lambda$ is filled -with $1$s and $0$s for the rest half, and symmetrically -for the second column. $D$ is a diagonal matrix with parameter -$\sigma_{u}^{2}$ on the diagonal. - ```{code-cell} python3 -Λ = np.zeros((N, k)) -Λ[:N//2, 0] = 1 -Λ[N//2:, 1] = 1 +# set parameters +T = 80 +T = 160 +# coefficients of the second order difference equation +𝛼0 = 10 +𝛼1 = 1.53 +𝛼2 = -.9 -σu = .5 -D = np.eye(N) * σu ** 2 +# variance of u +σu = 1. +σu = 10. + +# distribution of y_{-1} and y_{0} +μy_tilde = np.array([1., 0.5]) +Σy_tilde = np.array([[2., 1.], [1., 0.5]]) ``` ```{code-cell} python3 -# compute Σy -Σy = Λ @ Λ.T + D -``` +# construct A and A^{\prime} +A = np.zeros((T, T)) -We can now construct the mean vector and the covariance matrix for -$Z$. +for i in range(T): + A[i, i] = 1 -```{code-cell} python3 -μz = np.zeros(k+N) + if i-1 >= 0: + A[i, i-1] = -𝛼1 -Σz = np.empty((k+N, k+N)) + if i-2 >= 0: + A[i, i-2] = -𝛼2 -Σz[:k, :k] = np.eye(k) -Σz[:k, k:] = Λ.T -Σz[k:, :k] = Λ -Σz[k:, k:] = Σy +A_inv = np.linalg.inv(A) ``` ```{code-cell} python3 -z = np.random.multivariate_normal(μz, Σz) +# compute the mean vectors of b and y +μb = np.full(T, 𝛼0) +μb[0] += 𝛼1 * μy_tilde[1] + 𝛼2 * μy_tilde[0] +μb[1] += 𝛼2 * μy_tilde[1] -f = z[:k] -y = z[k:] +μy = A_inv @ μb ``` ```{code-cell} python3 -multi_normal_factor = MultivariateNormal(μz, Σz) -multi_normal_factor.partition(k) -``` +# compute the covariance matrices of b and y +Σu = np.eye(T) * σu ** 2 -Let’s compute the conditional distribution of the hidden factor -$f$ on the observations $Y$, namely, $f \mid Y=y$. +Σb = np.zeros((T, T)) -```{code-cell} python3 -multi_normal_factor.cond_dist(0, y) +C = np.array([[𝛼2, 𝛼1], [0, 𝛼2]]) +Σb[:2, :2] = C @ Σy_tilde @ C.T + +Σy = A_inv @ (Σb + Σu) @ A_inv.T ``` -We can verify that the conditional mean -$E \left[f \mid Y=y\right] = B Y$ where -$B = \Lambda^{\prime} \Sigma_{y}^{-1}$. +## Application to Stock Price Model -```{code-cell} python3 -B = Λ.T @ np.linalg.inv(Σy) +Let -B @ y -``` +$$ +p_{t} = \sum_{j=0}^{T-t} \beta^{j} y_{t+j} +$$ -Similarly, we can compute the conditional distribution $Y \mid f$. +Form -```{code-cell} python3 -multi_normal_factor.cond_dist(1, f) -``` +$$ +\underset{\equiv p}{\underbrace{\left[\begin{array}{c} +p_{1}\\ +p_{2}\\ +p_{3}\\ +\vdots\\ +p_{T} +\end{array}\right]}}=\underset{\equiv B}{\underbrace{\left[\begin{array}{ccccc} +1 & \beta & \beta^{2} & \cdots & \beta^{T-1}\\ +0 & 1 & \beta & \cdots & \beta^{T-2}\\ +0 & 0 & 1 & \cdots & \beta^{T-3}\\ +\vdots & \vdots & \vdots & \vdots & \vdots\\ +0 & 0 & 0 & \cdots & 1 +\end{array}\right]}}\left[\begin{array}{c} +y_{1}\\ +y_{2}\\ +y_{3}\\ +\vdots\\ +y_{T} +\end{array}\right] +$$ -It can be verified that the mean is -$\Lambda I^{-1} f = \Lambda f$. +we have + +$$ +\begin{aligned} +\mu_{p} = B \mu_{y} \\ +\Sigma_{p} = B \Sigma_{y} B^{\prime} +\end{aligned} +$$ ```{code-cell} python3 -Λ @ f +β = .96 ``` -## PCA as Approximation to Factor Analytic Model +```{code-cell} python3 +# construct B +B = np.zeros((T, T)) -For fun, let’s apply a Principal Components Analysis (PCA) decomposition -to a covariance matrix $\Sigma_y$ that in fact is governed by our factor-analytic -model. - -Technically, this means that the PCA model is misspecified. (Can you -explain why?) - -Nevertheless, this exercise will let us study how well the first two -principal components from a PCA can approximate the conditional -expectations $E f_i | Y$ for our two factors $f_i$, -$i=1,2$ for the factor analytic model that we have assumed truly -governs the data on $Y$ we have generated. +for i in range(T): + B[i, i:] = β ** np.arange(0, T-i) +``` -So we compute the PCA decomposition +Denote $$ -\Sigma_{y} = P \tilde{\Lambda} P^{\prime} +z=\left[\begin{array}{c} +y\\ +p +\end{array}\right]=\underset{\equiv D}{\underbrace{\left[\begin{array}{c} +I\\ +B +\end{array}\right]}} y $$ -where $\tilde{\Lambda}$ is a diagonal matrix. - -We have +Thus, $\{y_t\}_{t=1}^{T}$ and $\{p_t\}_{t=1}^{T}$ jointly +follow the multivariate normal distribution +$N \left(\mu_{z}, \Sigma_{z}\right)$, where $$ -Y = P \epsilon +\mu_{z}=D\mu_{y} $$ -and - $$ -\epsilon = P^\prime Y +\Sigma_{z}=D\Sigma_{y}D^{\prime} $$ -Note that we will arrange the eigenvectors in $P$ in the -*descending* order of eigenvalues. - ```{code-cell} python3 -𝜆_tilde, P = np.linalg.eigh(Σy) - -# arrange the eigenvectors by eigenvalues -ind = sorted(range(N), key=lambda x: 𝜆_tilde[x], reverse=True) - -P = P[:, ind] -𝜆_tilde = 𝜆_tilde[ind] -Λ_tilde = np.diag(𝜆_tilde) - -print('𝜆_tilde =', 𝜆_tilde) +D = np.vstack([np.eye(T), B]) ``` ```{code-cell} python3 -# verify the orthogonality of eigenvectors -np.abs(P @ P.T - np.eye(N)).max() +μz = D @ μy +Σz = D @ Σy @ D.T ``` -```{code-cell} python3 -# verify the eigenvalue decomposition is correct -P @ Λ_tilde @ P.T -``` +We can simulate paths of $y_{t}$ and $p_{t}$ and compute the +conditional mean $E \left[p_{t} \mid y_{t-1}, y_{t}\right]$ using +the `MultivariateNormal` class. ```{code-cell} python3 -ε = P.T @ y - -print("ε = ", ε) +z = np.random.multivariate_normal(μz, Σz) +y, p = z[:T], z[T:] ``` ```{code-cell} python3 -# print the values of the two factors - -print('f = ', f) -``` - -Below we’ll plot several things +cond_Ep = np.empty(T-1) -- the $N$ values of $y$ -- the $N$ values of the principal components $\epsilon$ -- the value of the first factor $f_1$ plotted only for the first - $N/2$ observations of $y$ for which it receives a - non-zero loading in $\Lambda$ -- the value of the second factor $f_2$ plotted only for the final - $N/2$ observations for which it receives a non-zero loading in - $\Lambda$ +sub_μ = np.empty(3) +sub_Σ = np.empty((3, 3)) +for t in range(2, T+1): + sub_μ[:] = μz[[t-2, t-1, T-1+t]] + sub_Σ[:, :] = Σz[[t-2, t-1, T-1+t], :][:, [t-2, t-1, T-1+t]] -```{code-cell} python3 -plt.scatter(range(N), y, label='y') -plt.scatter(range(N), ε, label='$\epsilon$') -plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$') -plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$') -plt.legend() + multi_normal = MultivariateNormal(sub_μ, sub_Σ) + multi_normal.partition(2) -plt.show() + cond_Ep[t-2] = multi_normal.cond_dist(1, y[t-2:t])[0][0] ``` -Consequently, the first two $\epsilon_{j}$ correspond to the -largest two eigenvalues. - -Let’s look at them, after which we’ll look at $E f | y = B y$ - ```{code-cell} python3 -ε[:2] -``` +plt.plot(range(1, T), y[1:], label='$y_{t}$') +plt.plot(range(1, T), y[:-1], label='$y_{t-1}$') +plt.plot(range(1, T), p[1:], label='$p_{t}$') +plt.plot(range(1, T), cond_Ep, label='$Ep_{t}|y_{t}, y_{t-1}$') -```{code-cell} python3 -# compare with Ef|y -B @ y +plt.xlabel('t') +plt.legend(loc=1) +plt.show() ``` -The fraction of variance in $y_{t}$ explained by the first two -principal components can be computed as below. +In the above graph, the green line is what the price of the stock would +be if people had perfect foresight about the path of dividends while the +green line is the conditional expectation $E p_t | y_t, y_{t-1}$, which is what the price would +be if people did not have perfect foresight but were optimally +predicting future dividends on the basis of the information +$y_t, y_{t-1}$ at time $t$. -```{code-cell} python3 -𝜆_tilde[:2].sum() / 𝜆_tilde.sum() -``` +## Filtering Foundations -Compute +Assume that $x_0$ is an $n \times 1$ random vector and that +$y_0$ is a $p \times 1$ random vector determined by the +*observation equation* $$ -\hat{Y} = P_{j} \epsilon_{j} + P_{k} \epsilon_{k} +y_0 = G x_0 + v_0 , \quad x_0 \sim {\mathcal N}(\hat x_0, \Sigma_0), \quad v_0 \sim {\mathcal N}(0, R) $$ -where $P_{j}$ and $P_{k}$ correspond to the largest two -eigenvalues. - -```{code-cell} python3 -y_hat = P[:, :2] @ ε[:2] -``` - -In this example, it turns out that the projection $\hat{Y}$ of -$Y$ on the first two principal components does a good job of -approximating $Ef \mid y$. - -We confirm this in the following plot of $f$, -$E y \mid f$, $E f \mid y$, and $\hat{y}$ on the -coordinate axis versus $y$ on the ordinate axis. - -```{code-cell} python3 -plt.scatter(range(N), Λ @ f, label='$Ey|f$') -plt.scatter(range(N), y_hat, label='$\hat{y}$') -plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$') -plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$') - -Efy = B @ y -plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label='$Ef_{1}|y$') -plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label='$Ef_{2}|y$') -plt.legend() - -plt.show() -``` +where $v_0$ is orthogonal to $x_0$, $G$ is a +$p \times n$ matrix, and $R$ is a $p \times p$ +positive definite matrix. -The covariance matrix of $\hat{Y}$ can be computed by first -constructing the covariance matrix of $\epsilon$ and then use the -upper left block for $\epsilon_{1}$ and $\epsilon_{2}$. +We consider the problem of someone who -```{code-cell} python3 -Σεjk = (P.T @ Σy @ P)[:2, :2] +* *observes* $y_0$ +* does not observe $x_0$, +* knows $\hat x_0, \Sigma_0, G, R$ and therefore the joint probability distribution of the vector $\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$ +* wants to infer $x_0$ from $y_0$ in light of what he knows about that +joint probability distribution. -Pjk = P[:, :2] +Therefore, the person wants to construct the probability distribution of +$x_0$ conditional on the random vector $y_0$. -Σy_hat = Pjk @ Σεjk @ Pjk.T -print('Σy_hat = \n', Σy_hat) -``` +The joint distribution of +$\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$ is multivariate normal +${\mathcal N}(\mu, \Sigma)$ with -## Stochastic Difference Equation +$$ +\mu = \begin{bmatrix} \hat x_0 \cr G \hat x_0 \end{bmatrix} , \quad + \Sigma = \begin{bmatrix} \Sigma_0 & \Sigma_0 G' \cr + G \Sigma_0 & G \Sigma_0 G' + R \end{bmatrix} +$$ -Consider the stochastic second-order linear difference equation +By applying an appropriate instance of the above formulas for the mean vector $\hat \mu_1$ and covariance matrix +$\hat \Sigma_{11}$ of $z_1$ conditional on $z_2$, we find that the probability distribution of +$x_0$ conditional on $y_0$ is +${\mathcal N}(\tilde x_0, \tilde \Sigma_0)$ where $$ -y_{t} = \alpha_{0} + \alpha_{1} y_{y-1} + \alpha_{2} y_{t-2} + u_{t} +\begin{aligned} \beta_0 & = \Sigma_0 G' (G \Sigma_0 G' + R)^{-1} \cr +\tilde x_0 & = \hat x_0 + \beta_0 ( y_0 - G \hat x_0) \cr + \tilde \Sigma_0 & = \Sigma_0 - \Sigma_0 G' (G \Sigma_0 G' + R)^{-1} G \Sigma_0 + \end{aligned} $$ -where $u_{t} \sim N \left(0, \sigma_{u}^{2}\right)$ and +### Step toward dynamics + +Now suppose that we are in a time series setting and that we have the +one-step state transition equation $$ -\left[\begin{array}{c} -y_{-1}\\ -y_{0} -\end{array}\right]\sim N\left(\mu_{\tilde{y}},\Sigma_{\tilde{y}}\right) +x_1 = A x_0 + C w_1 , \quad w_1 \sim {\mathcal N}(0, I ) $$ -It can be written as a stacked system +where $A$ is an $n \times n$ matrix and $C$ is an +$n \times m$ matrix. + +It follows that the probability distribution of $x_1$ conditional +on $y_0$ is $$ -\underset{\equiv A}{\underbrace{\left[\begin{array}{cccccccc} -1 & 0 & 0 & 0 & \cdots & 0 & 0 & 0\\ --\alpha_{1} & 1 & 0 & 0 & \cdots & 0 & 0 & 0\\ --\alpha_{2} & -\alpha_{1} & 1 & 0 & \cdots & 0 & 0 & 0\\ -0 & -\alpha_{2} & -\alpha_{1} & 1 & \cdots & 0 & 0 & 0\\ -\vdots & \vdots & \vdots & \vdots & \cdots & \vdots & \vdots & \vdots\\ -0 & 0 & 0 & 0 & \cdots & -\alpha_{2} & -\alpha_{1} & 1 -\end{array}\right]}}\left[\begin{array}{c} -y_{1}\\ -y_{2}\\ -y_{3}\\ -y_{4}\\ -\vdots\\ -y_{T} -\end{array}\right]=\underset{\equiv b}{\underbrace{\left[\begin{array}{c} -\alpha_{0}+\alpha_{1}y_{0}+\alpha_{2}y_{-1}\\ -\alpha_{0}+\alpha_{2}y_{0}\\ -\alpha_{0}\\ -\alpha_{0}\\ -\vdots\\ -\alpha_{0} -\end{array}\right]}} +x_1 | y_0 \sim {\mathcal N}(A \tilde x_0 , A \tilde \Sigma_0 A' + C C' ) $$ -We can compute $y$ by solving the system +Define $$ -y = A^{-1} \left(b + u\right) +\begin{aligned} \hat x_1 & = A \tilde x_0 \cr + \Sigma_1 & = A \tilde \Sigma_0 A' + C C' +\end{aligned} $$ -We have +### Dynamic version + +Suppose now that for $t \geq 0$, +$\{x_{t+1}, y_t\}_{t=0}^\infty$ are governed by the equations $$ \begin{aligned} -\mu_{y} = A^{-1} \mu_{b} \\ -\Sigma_{y} &= A^{-1} E \left[\left(b - \mu_{b} + u \right) \left(b - \mu_{b} + u \right)^{\prime}\right] \left(A^{-1}\right)^{\prime} \\ - &= A^{-1} \left(\Sigma_{b} + \Sigma_{u} \right) \left(A^{-1}\right)^{\prime} +x_{t+1} & = A x_t + C w_{t+1} \cr +y_t & = G x_t + v_t \end{aligned} $$ -where +where as before $x_0 \sim {\mathcal N}(\hat x_0, \Sigma_0)$, +$w_{t+1}$ is the $t+1$th component of an i.i.d. stochastic +process distributed as $w_{t+1} \sim {\mathcal N}(0, I)$, and +$v_t$ is the $t$th component of an i.i.d. process +distributed as $v_t \sim {\mathcal N}(0, R)$ and the +$\{w_{t+1}\}_{t=0}^\infty$ and $\{v_t\}_{t=0}^\infty$ +processes are orthogonal at all pairs of dates. -$$ -\mu_{b}=\left[\begin{array}{c} -\alpha_{0}+\alpha_{1}\mu_{y_{0}}+\alpha_{2}\mu_{y_{-1}}\\ -\alpha_{0}+\alpha_{2}\mu_{y_{0}}\\ -\alpha_{0}\\ -\vdots\\ -\alpha_{0} -\end{array}\right] -$$ +The logic and +formulas that we applied above imply that the probability distribution +of $x_t$ conditional on +$y_0, y_1, \ldots , y_{t-1} = y^{t-1}$ is $$ -\Sigma_{b}=\left[\begin{array}{cc} -C\Sigma_{\tilde{y}}C^{\prime} & \boldsymbol{0}_{N-2\times N-2}\\ -\boldsymbol{0}_{N-2\times2} & \boldsymbol{0}_{N-2\times N-2} -\end{array}\right],\quad C=\left[\begin{array}{cc} -\alpha_{2} & \alpha_{1}\\ -0 & \alpha_{2} -\end{array}\right] +x_t | y^{t-1} \sim {\mathcal N}(A \tilde x_t , A \tilde \Sigma_t A' + C C' ) $$ +where $\{\tilde x_t, \tilde \Sigma_t\}_{t=1}^\infty$ can be +computed by iterating on the following equations starting from +$t=1$ and initial conditions for +$\tilde x_0, \tilde \Sigma_0$ computed as we have above: + $$ -\Sigma_{u}=\left[\begin{array}{cccc} -\sigma_{u}^{2} & 0 & \cdots & 0\\ -0 & \sigma_{u}^{2} & \cdots & 0\\ -\vdots & \vdots & \vdots & \vdots\\ -0 & 0 & \cdots & \sigma_{u}^{2} -\end{array}\right] +\begin{aligned} \Sigma_t & = A \tilde \Sigma_{t-1} A' + C C' \cr + \hat x_t & = A \tilde x_{t-1} \cr +\beta_t & = \Sigma_t G' (G \Sigma_t G' + R)^{-1} \cr +\tilde x_t & = \hat x_t + \beta_t ( y_t - G \hat x_t) \cr + \tilde \Sigma_t & = \Sigma_t - \Sigma_t G' (G \Sigma_t G' + R)^{-1} G \Sigma_t + \end{aligned} $$ +We can use the Python class *MultivariateNormal* to construct examples. + +Here is an example for a single period problem at time $0$ + ```{code-cell} python3 -# set parameters -T = 80 -T = 160 -# coefficients of the second order difference equation -𝛼0 = 10 -𝛼1 = 1.53 -𝛼2 = -.9 +G = np.array([[1., 3.]]) +R = np.array([[1.]]) -# variance of u -σu = 1. -σu = 10. +x0_hat = np.array([0., 1.]) +Σ0 = np.array([[1., .5], [.3, 2.]]) -# distribution of y_{-1} and y_{0} -μy_tilde = np.array([1., 0.5]) -Σy_tilde = np.array([[2., 1.], [1., 0.5]]) +μ = np.hstack([x0_hat, G @ x0_hat]) +Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]]) ``` ```{code-cell} python3 -# construct A and A^{\prime} -A = np.zeros((T, T)) - -for i in range(T): - A[i, i] = 1 +# construction of the multivariate normal instance +multi_normal = MultivariateNormal(μ, Σ) +``` - if i-1 >= 0: - A[i, i-1] = -𝛼1 +```{code-cell} python3 +multi_normal.partition(2) +``` - if i-2 >= 0: - A[i, i-2] = -𝛼2 +```{code-cell} python3 +# the observation of y +y0 = 2.3 -A_inv = np.linalg.inv(A) +# conditional distribution of x0 +μ1_hat, Σ11 = multi_normal.cond_dist(0, y0) +μ1_hat, Σ11 ``` ```{code-cell} python3 -# compute the mean vectors of b and y -μb = np.full(T, 𝛼0) -μb[0] += 𝛼1 * μy_tilde[1] + 𝛼2 * μy_tilde[0] -μb[1] += 𝛼2 * μy_tilde[1] +A = np.array([[0.5, 0.2], [-0.1, 0.3]]) +C = np.array([[2.], [1.]]) -μy = A_inv @ μb +# conditional distribution of x1 +x1_cond = A @ μ1_hat +Σ1_cond = C @ C.T + A @ Σ11 @ A.T +x1_cond, Σ1_cond ``` +### Code for Iterating + +Here is code for solving a dynamic filtering problem by iterating on our +equations, followed by an example. + ```{code-cell} python3 -# compute the covariance matrices of b and y -Σu = np.eye(T) * σu ** 2 +def iterate(x0_hat, Σ0, A, C, G, R, y_seq): -Σb = np.zeros((T, T)) + p, n = G.shape -C = np.array([[𝛼2, 𝛼1], [0, 𝛼2]]) -Σb[:2, :2] = C @ Σy_tilde @ C.T + T = len(y_seq) + x_hat_seq = np.empty((T+1, n)) + Σ_hat_seq = np.empty((T+1, n, n)) -Σy = A_inv @ (Σb + Σu) @ A_inv.T + x_hat_seq[0] = x0_hat + Σ_hat_seq[0] = Σ0 + + for t in range(T): + xt_hat = x_hat_seq[t] + Σt = Σ_hat_seq[t] + μ = np.hstack([xt_hat, G @ xt_hat]) + Σ = np.block([[Σt, Σt @ G.T], [G @ Σt, G @ Σt @ G.T + R]]) + + # filtering + multi_normal = MultivariateNormal(μ, Σ) + multi_normal.partition(n) + x_tilde, Σ_tilde = multi_normal.cond_dist(0, y_seq[t]) + + # forecasting + x_hat_seq[t+1] = A @ x_tilde + Σ_hat_seq[t+1] = C @ C.T + A @ Σ_tilde @ A.T + + return x_hat_seq, Σ_hat_seq ``` -## Application to Stock Price Model +```{code-cell} python3 +iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2]) +``` -Let +The iterative algorithm just described is a version of the celebrated **Kalman filter**. -$$ -p_{t} = \sum_{j=0}^{T-t} \beta^{j} y_{t+j} -$$ +We describe the Kalman filter and some applications of it in {doc}`A First Look at the Kalman Filter ` -Form + +## Classic Factor Analysis Model + +The factor analysis model widely used in psychology and other fields can +be represented as $$ -\underset{\equiv p}{\underbrace{\left[\begin{array}{c} -p_{1}\\ -p_{2}\\ -p_{3}\\ -\vdots\\ -p_{T} -\end{array}\right]}}=\underset{\equiv B}{\underbrace{\left[\begin{array}{ccccc} -1 & \beta & \beta^{2} & \cdots & \beta^{T-1}\\ -0 & 1 & \beta & \cdots & \beta^{T-2}\\ -0 & 0 & 1 & \cdots & \beta^{T-3}\\ -\vdots & \vdots & \vdots & \vdots & \vdots\\ -0 & 0 & 0 & \cdots & 1 -\end{array}\right]}}\left[\begin{array}{c} -y_{1}\\ -y_{2}\\ -y_{3}\\ -\vdots\\ -y_{T} -\end{array}\right] +Y = \Lambda f + U $$ -we have +where + +1. $Y$ is $n \times 1$ random vector, + $E U U^{\prime} = D$ is a diagonal matrix, +1. $\Lambda$ is $n \times k$ coefficient matrix, +1. $f$ is $k \times 1$ random vector, + $E f f^{\prime} = I$, +1. $U$ is $n \times 1$ random vector, and $U \perp f$. +1. It is presumed that $k$ is small relative to $n$; often + $k$ is only $1$ or $2$, as in our IQ examples. + +This implies that $$ \begin{aligned} -\mu_{p} = B \mu_{y} \\ -\Sigma_{p} = B \Sigma_{y} B^{\prime} +\Sigma_y = E Y Y^{\prime} = \Lambda \Lambda^{\prime} + D \\ +E Y f^{\prime} = \Lambda \\ +E f Y^{\prime} = \Lambda^{\prime} \end{aligned} $$ -```{code-cell} python3 -β = .96 -``` - -```{code-cell} python3 -# construct B -B = np.zeros((T, T)) +Thus, the covariance matrix $\Sigma_Y$ is the sum of a diagonal +matrix $D$ and a positive semi-definite matrix +$\Lambda \Lambda^{\prime}$ of rank $k$. -for i in range(T): - B[i, i:] = β ** np.arange(0, T-i) -``` +This means that all covariances among the $n$ components of the +$Y$ vector are intermediated by their common dependencies on the +$k<$ factors. -Denote +Form $$ -z=\left[\begin{array}{c} -y\\ -p -\end{array}\right]=\underset{\equiv D}{\underbrace{\left[\begin{array}{c} -I\\ -B -\end{array}\right]}} y +Z=\left(\begin{array}{c} +f\\ +Y +\end{array}\right) $$ -Thus, $\{y_t\}_{t=1}^{T}$ and $\{p_t\}_{t=1}^{T}$ jointly -follow the multivariate normal distribution -$N \left(\mu_{z}, \Sigma_{z}\right)$, where +the covariance matrix of the expanded random vector $Z$ can be +computed as $$ -\mu_{z}=D\mu_{y} +\Sigma_{z} = EZZ^{\prime}=\left(\begin{array}{cc} +I & \Lambda^{\prime}\\ +\Lambda & \Lambda\Lambda^{\prime}+D +\end{array}\right) $$ -$$ -\Sigma_{z}=D\Sigma_{y}D^{\prime} -$$ +In the following, we first construct the mean vector and the covariance +matrix for the case where $N=10$ and $k=2$. ```{code-cell} python3 -D = np.vstack([np.eye(T), B]) +N = 10 +k = 2 ``` -```{code-cell} python3 -μz = D @ μy -Σz = D @ Σy @ D.T -``` +We set the coefficient matrix $\Lambda$ and the covariance matrix +of $U$ to be -We can simulate paths of $y_{t}$ and $p_{t}$ and compute the -conditional mean $E \left[p_{t} \mid y_{t-1}, y_{t}\right]$ using -the `MultivariateNormal` class. +$$ +\Lambda=\left(\begin{array}{cc} +1 & 0\\ +\vdots & \vdots\\ +1 & 0\\ +0 & 1\\ +\vdots & \vdots\\ +0 & 1 +\end{array}\right),\quad D=\left(\begin{array}{cccc} +\sigma_{u}^{2} & 0 & \cdots & 0\\ +0 & \sigma_{u}^{2} & \cdots & 0\\ +\vdots & \vdots & \vdots & \vdots\\ +0 & 0 & \cdots & \sigma_{u}^{2} +\end{array}\right) +$$ + +where the first half of the first column of $\Lambda$ is filled +with $1$s and $0$s for the rest half, and symmetrically +for the second column. + +$D$ is a diagonal matrix with parameter +$\sigma_{u}^{2}$ on the diagonal. ```{code-cell} python3 -z = np.random.multivariate_normal(μz, Σz) -y, p = z[:T], z[T:] +Λ = np.zeros((N, k)) +Λ[:N//2, 0] = 1 +Λ[N//2:, 1] = 1 + +σu = .5 +D = np.eye(N) * σu ** 2 ``` ```{code-cell} python3 -cond_Ep = np.empty(T-1) +# compute Σy +Σy = Λ @ Λ.T + D +``` -sub_μ = np.empty(3) -sub_Σ = np.empty((3, 3)) -for t in range(2, T+1): - sub_μ[:] = μz[[t-2, t-1, T-1+t]] - sub_Σ[:, :] = Σz[[t-2, t-1, T-1+t], :][:, [t-2, t-1, T-1+t]] +We can now construct the mean vector and the covariance matrix for +$Z$. - multi_normal = MultivariateNormal(sub_μ, sub_Σ) - multi_normal.partition(2) +```{code-cell} python3 +μz = np.zeros(k+N) - cond_Ep[t-2] = multi_normal.cond_dist(1, y[t-2:t])[0][0] +Σz = np.empty((k+N, k+N)) + +Σz[:k, :k] = np.eye(k) +Σz[:k, k:] = Λ.T +Σz[k:, :k] = Λ +Σz[k:, k:] = Σy ``` ```{code-cell} python3 -plt.plot(range(1, T), y[1:], label='$y_{t}$') -plt.plot(range(1, T), y[:-1], label='$y_{t-1}$') -plt.plot(range(1, T), p[1:], label='$p_{t}$') -plt.plot(range(1, T), cond_Ep, label='$Ep_{t}|y_{t}, y_{t-1}$') +z = np.random.multivariate_normal(μz, Σz) -plt.xlabel('t') -plt.legend(loc=1) -plt.show() +f = z[:k] +y = z[k:] ``` -In the above graph, the green line is what the price of the stock would -be if people had perfect foresight about the path of dividends while the -green line is the conditional expectation $E p_t | y_t, y_{t-1}$, which is what the price would -be if people did not have perfect foresight but were optimally -predicting future dividends on the basis of the information -$y_t, y_{t-1}$ at time $t$. - -## Filtering Foundations - -Assume that $x_0$ is an $n \times 1$ random vector and that -$y_0$ is a $p \times 1$ random vector determined by the -*observation equation* - -$$ -y_0 = G x_0 + v_0 , \quad x_0 \sim {\mathcal N}(\hat x_0, \Sigma_0), \quad v_0 \sim {\mathcal N}(0, R) -$$ +```{code-cell} python3 +multi_normal_factor = MultivariateNormal(μz, Σz) +multi_normal_factor.partition(k) +``` -where $v_0$ is orthogonal to $x_0$, $G$ is a -$p \times n$ matrix, and $R$ is a $p \times p$ -positive definite matrix. +Let’s compute the conditional distribution of the hidden factor +$f$ on the observations $Y$, namely, $f \mid Y=y$. -We consider the problem of someone who *observes* $y_0$, who does -not observe $x_0$, who knows $\hat x_0, \Sigma_0, G, R$ – -and therefore knows the joint probability distribution of the vector -$\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$ – and who wants to -infer $x_0$ from $y_0$ in light of what he knows about that -joint probability distribution. +```{code-cell} python3 +multi_normal_factor.cond_dist(0, y) +``` -Therefore, the person wants to construct the probability distribution of -$x_0$ conditional on the random vector $y_0$. +We can verify that the conditional mean +$E \left[f \mid Y=y\right] = B Y$ where +$B = \Lambda^{\prime} \Sigma_{y}^{-1}$. -The joint distribution of -$\begin{bmatrix} x_0 \cr y_0 \end{bmatrix}$ is multivariate normal -${\mathcal N}(\mu, \Sigma)$ with +```{code-cell} python3 +B = Λ.T @ np.linalg.inv(Σy) -$$ -\mu = \begin{bmatrix} \hat x_0 \cr G \hat x_0 \end{bmatrix} , \quad - \Sigma = \begin{bmatrix} \Sigma_0 & \Sigma_0 G' \cr - G \Sigma_0 & G \Sigma_0 G' + R \end{bmatrix} -$$ +B @ y +``` -By applying an appropriate instance of the above formulas for the mean vector $\hat \mu_1$ and covariance matrix -$\hat \Sigma_{11}$ of $z_1$ conditional on $z_2$, we find that the probability distribution of -$x_0$ conditional on $y_0$ is -${\mathcal N}(\tilde x_0, \tilde \Sigma_0)$ where +Similarly, we can compute the conditional distribution $Y \mid f$. -$$ -\begin{aligned} \beta_0 & = \Sigma_0 G' (G \Sigma_0 G' + R)^{-1} \cr -\tilde x_0 & = \hat x_0 + \beta_0 ( y_0 - G \hat x_0) \cr - \tilde \Sigma_0 & = \Sigma_0 - \Sigma_0 G' (G \Sigma_0 G' + R)^{-1} G \Sigma_0 - \end{aligned} -$$ +```{code-cell} python3 +multi_normal_factor.cond_dist(1, f) +``` -### Step toward dynamics +It can be verified that the mean is +$\Lambda I^{-1} f = \Lambda f$. -Now suppose that we are in a time series setting and that we have the -one-step state transition equation +```{code-cell} python3 +Λ @ f +``` -$$ -x_1 = A x_0 + C w_1 , \quad w_1 \sim {\mathcal N}(0, I ) -$$ +## PCA and Factor Analysis -where $A$ is an $n \times n$ matrix and $C$ is an -$n \times m$ matrix. +To learn about Principal Components Analysis (PCA), please see this lecture {doc}`Singular Value Decompositions `. -It follows that the probability distribution of $x_1$ conditional -on $y_0$ is +For fun, let’s apply a PCA decomposition +to a covariance matrix $\Sigma_y$ that in fact is governed by our factor-analytic +model. -$$ -x_1 | y_0 \sim {\mathcal N}(A \tilde x_0 , A \tilde \Sigma_0 A' + C C' ) -$$ -Define -$$ -\begin{aligned} \hat x_1 & = A \tilde x_0 \cr - \Sigma_1 & = A \tilde \Sigma_0 A' + C C' -\end{aligned} -$$ +Technically, this means that the PCA model is misspecified. (Can you +explain why?) -### Dynamic version +Nevertheless, this exercise will let us study how well the first two +principal components from a PCA can approximate the conditional +expectations $E f_i | Y$ for our two factors $f_i$, +$i=1,2$ for the factor analytic model that we have assumed truly +governs the data on $Y$ we have generated. -Suppose now that for $t \geq 0$, -$\{x_{t+1}, y_t\}_{t=0}^\infty$ are governed by the equations +So we compute the PCA decomposition $$ -\begin{aligned} -x_{t+1} & = A x_t + C w_{t+1} \cr -y_t & = G x_t + v_t -\end{aligned} +\Sigma_{y} = P \tilde{\Lambda} P^{\prime} $$ -where as before $x_0 \sim {\mathcal N}(\hat x_0, \Sigma_0)$, -$w_{t+1}$ is the $t+1$th component of an i.i.d. stochastic -process distributed as $w_{t+1} \sim {\mathcal N}(0, I)$, and -$v_t$ is the $t$th component of an i.i.d. process -distributed as $v_t \sim {\mathcal N}(0, R)$ and the -$\{w_{t+1}\}_{t=0}^\infty$ and $\{v_t\}_{t=0}^\infty$ -processes are orthogonal at all pairs of dates. +where $\tilde{\Lambda}$ is a diagonal matrix. -The logic and -formulas that we applied above imply that the probability distribution -of $x_t$ conditional on -$y_0, y_1, \ldots , y_{t-1} = y^{t-1}$ is +We have $$ -x_t | y^{t-1} \sim {\mathcal N}(A \tilde x_t , A \tilde \Sigma_t A' + C C' ) +Y = P \epsilon $$ -where $\{\tilde x_t, \tilde \Sigma_t\}_{t=1}^\infty$ can be -computed by iterating on the following equations starting from -$t=1$ and initial conditions for -$\tilde x_0, \tilde \Sigma_0$ computed as we have above: +and $$ -\begin{aligned} \Sigma_t & = A \tilde \Sigma_{t-1} A' + C C' \cr - \hat x_t & = A \tilde x_{t-1} \cr -\beta_t & = \Sigma_t G' (G \Sigma_t G' + R)^{-1} \cr -\tilde x_t & = \hat x_t + \beta_t ( y_t - G \hat x_t) \cr - \tilde \Sigma_t & = \Sigma_t - \Sigma_t G' (G \Sigma_t G' + R)^{-1} G \Sigma_t - \end{aligned} +\epsilon = P^\prime Y $$ -We can use the Python class *MultivariateNormal* to construct examples. - -Here is an example for a single period problem at time $0$ +Note that we will arrange the eigenvectors in $P$ in the +*descending* order of eigenvalues. ```{code-cell} python3 -G = np.array([[1., 3.]]) -R = np.array([[1.]]) +𝜆_tilde, P = np.linalg.eigh(Σy) -x0_hat = np.array([0., 1.]) -Σ0 = np.array([[1., .5], [.3, 2.]]) +# arrange the eigenvectors by eigenvalues +ind = sorted(range(N), key=lambda x: 𝜆_tilde[x], reverse=True) -μ = np.hstack([x0_hat, G @ x0_hat]) -Σ = np.block([[Σ0, Σ0 @ G.T], [G @ Σ0, G @ Σ0 @ G.T + R]]) +P = P[:, ind] +𝜆_tilde = 𝜆_tilde[ind] +Λ_tilde = np.diag(𝜆_tilde) + +print('𝜆_tilde =', 𝜆_tilde) ``` ```{code-cell} python3 -# construction of the multivariate normal instance -multi_normal = MultivariateNormal(μ, Σ) +# verify the orthogonality of eigenvectors +np.abs(P @ P.T - np.eye(N)).max() ``` ```{code-cell} python3 -multi_normal.partition(2) +# verify the eigenvalue decomposition is correct +P @ Λ_tilde @ P.T ``` ```{code-cell} python3 -# the observation of y -y0 = 2.3 +ε = P.T @ y -# conditional distribution of x0 -μ1_hat, Σ11 = multi_normal.cond_dist(0, y0) -μ1_hat, Σ11 +print("ε = ", ε) ``` ```{code-cell} python3 -A = np.array([[0.5, 0.2], [-0.1, 0.3]]) -C = np.array([[2.], [1.]]) +# print the values of the two factors -# conditional distribution of x1 -x1_cond = A @ μ1_hat -Σ1_cond = C @ C.T + A @ Σ11 @ A.T -x1_cond, Σ1_cond +print('f = ', f) ``` -### Code for Iterating +Below we’ll plot several things -Here is code for solving a dynamic filtering problem by iterating on our -equations, followed by an example. +- the $N$ values of $y$ +- the $N$ values of the principal components $\epsilon$ +- the value of the first factor $f_1$ plotted only for the first + $N/2$ observations of $y$ for which it receives a + non-zero loading in $\Lambda$ +- the value of the second factor $f_2$ plotted only for the final + $N/2$ observations for which it receives a non-zero loading in + $\Lambda$ ```{code-cell} python3 -def iterate(x0_hat, Σ0, A, C, G, R, y_seq): +plt.scatter(range(N), y, label='y') +plt.scatter(range(N), ε, label='$\epsilon$') +plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$') +plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$') +plt.legend() - p, n = G.shape +plt.show() +``` - T = len(y_seq) - x_hat_seq = np.empty((T+1, n)) - Σ_hat_seq = np.empty((T+1, n, n)) +Consequently, the first two $\epsilon_{j}$ correspond to the +largest two eigenvalues. - x_hat_seq[0] = x0_hat - Σ_hat_seq[0] = Σ0 +Let’s look at them, after which we’ll look at $E f | y = B y$ - for t in range(T): - xt_hat = x_hat_seq[t] - Σt = Σ_hat_seq[t] - μ = np.hstack([xt_hat, G @ xt_hat]) - Σ = np.block([[Σt, Σt @ G.T], [G @ Σt, G @ Σt @ G.T + R]]) +```{code-cell} python3 +ε[:2] +``` - # filtering - multi_normal = MultivariateNormal(μ, Σ) - multi_normal.partition(n) - x_tilde, Σ_tilde = multi_normal.cond_dist(0, y_seq[t]) +```{code-cell} python3 +# compare with Ef|y +B @ y +``` - # forecasting - x_hat_seq[t+1] = A @ x_tilde - Σ_hat_seq[t+1] = C @ C.T + A @ Σ_tilde @ A.T +The fraction of variance in $y_{t}$ explained by the first two +principal components can be computed as below. - return x_hat_seq, Σ_hat_seq +```{code-cell} python3 +𝜆_tilde[:2].sum() / 𝜆_tilde.sum() ``` +Compute + +$$ +\hat{Y} = P_{j} \epsilon_{j} + P_{k} \epsilon_{k} +$$ + +where $P_{j}$ and $P_{k}$ correspond to the largest two +eigenvalues. + ```{code-cell} python3 -iterate(x0_hat, Σ0, A, C, G, R, [2.3, 1.2, 3.2]) +y_hat = P[:, :2] @ ε[:2] ``` -The iterative algorithm just described is a version of the celebrated **Kalman filter**. +In this example, it turns out that the projection $\hat{Y}$ of +$Y$ on the first two principal components does a good job of +approximating $Ef \mid y$. -We describe the Kalman filter and some applications of it in {doc}`A First Look at the Kalman Filter ` +We confirm this in the following plot of $f$, +$E y \mid f$, $E f \mid y$, and $\hat{y}$ on the +coordinate axis versus $y$ on the ordinate axis. + +```{code-cell} python3 +plt.scatter(range(N), Λ @ f, label='$Ey|f$') +plt.scatter(range(N), y_hat, label='$\hat{y}$') +plt.hlines(f[0], 0, N//2-1, ls='--', label='$f_{1}$') +plt.hlines(f[1], N//2, N-1, ls='-.', label='$f_{2}$') + +Efy = B @ y +plt.hlines(Efy[0], 0, N//2-1, ls='--', color='b', label='$Ef_{1}|y$') +plt.hlines(Efy[1], N//2, N-1, ls='-.', color='b', label='$Ef_{2}|y$') +plt.legend() + +plt.show() +``` + +The covariance matrix of $\hat{Y}$ can be computed by first +constructing the covariance matrix of $\epsilon$ and then use the +upper left block for $\epsilon_{1}$ and $\epsilon_{2}$. + +```{code-cell} python3 +Σεjk = (P.T @ Σy @ P)[:2, :2] + +Pjk = P[:, :2] + +Σy_hat = Pjk @ Σεjk @ Pjk.T +print('Σy_hat = \n', Σy_hat) +```