14.31 code #20

ColdTeapot273K · 2022-06-15T23:01:34Z

TLDR: I think a piece of code in the book is hacky in one place and works correctly by sheer coincidence; perhaps worth correcting for it here because it took quite a while for me to debug this (and probably took/will take for someone else) as i was following the example notebooks.

Proposed solution:
Turn this:

kl_data = dict(
    N=kl_dyads.shape[0],
    N_households=kl_dyads.hidB.max(),
    did=kl_dyads.did.values - 1,
    hidA=kl_dyads.hidA.values - 1,
    hidB=kl_dyads.hidB.values - 1,
    giftsAB=kl_dyads.giftsAB.values,
    giftsBA=kl_dyads.giftsBA.values,
)

Into this:

kl_data = dict(
    N=kl_dyads.shape[0],
    # N_households=kl_dyads.hidB.max(),
    did=kl_dyads.did.values - 1,
    hidA=kl_dyads.hidA.values - 1,
    hidB=kl_dyads.hidB.values - 1,
    giftsAB=kl_dyads.giftsAB.values,
    giftsBA=kl_dyads.giftsBA.values,
)

kl_data["N_households"] = len(set(kl_data["hidA"]) | set(kl_data["hidB"]))

Details:
While reproducing models from 2022 lectures and the book with pandas + numpyro + plotly stack and found a peculiar thing:

in 14.31 code N_households is computed as kl_dyads.hidB.max(), same as in the book. It's implied that hidB and hidA in original data are 1-indexed, so -1 is performed at kl_dyads init. But N_households is computed without -1.
And one must pay attention to it if one computes it not at init but somewhere downstream instead (like I insolently did). The model results (coefficients) with will look quite the same but the sampling will be bad (very low n_eff), as if not reparameterized. A recipe for a very fun time debugging correlated varying effects :)
The book's code, I think, is a hack since proper logic would imply instead computing a number of unique entries in hidB and hidA (see solution).

The text was updated successfully, but these errors were encountered:

fehiepsi · 2022-06-16T17:45:30Z

You are right. N_households needs to be the size of unique numbers in hidB and hidA. We also need to make sure that hidB and hidA ranges from 0 to N_households-1, otherwise we can use hidB and hidA as index arrays. (e.g. hidB = [0, 2, 4] & N_households=3 would be invalid)

ColdTeapot273K · 2022-06-16T22:01:17Z

We also need to make sure that hidB and hidA ranges from 0 to N_households-1

how about

kl_data["N_households"] = max(set(df_dev["hidA"]) | set(df_dev["hidB"])) + 1

# testing the logic:
testA = [0, 1, 2]
testB = [3,5]
assert max(set(testA) | set(testB)) + 1 == len([x for x in range(5+1)])

Btw, since we are at it, in that same section is there really a benefit in using lower Cholesky (LKJCholesky) instead of vanilla Cholesky (LKJ)?
I've found speed to be quite the same while in terms of readability this:

# dyad effects
z = numpyro.sample("z", dist.Normal(0, 1).expand([2, N]))
L_Rho_d = numpyro.sample("L_Rho_d", dist.LKJCholesky(2, 8))
sigma_d = numpyro.sample("sigma_d", dist.Exponential(1))
d = numpyro.deterministic(
    "d", ((jnp.repeat(sigma_d, 2)[..., None] * L_Rho_d) @ z).T
)

is somewhat more cryptic than this:

# dyad effects
L_Rho_d = numpyro.sample("L_Rho_d", dist.LKJ(2, 8))
sigma_d = numpyro.sample("sigma_d", dist.Exponential(1))
cov = jnp.outer(sigma_d, sigma_d) * L_Rho_d
d = numpyro.sample("d", dist.MultivariateNormal(0, cov).expand([N]))

fehiepsi · 2022-06-17T00:03:18Z

Sure, maybe modifying directly at N_households=max(... of kldyas) is cleaner. Could you help me make the change?

Regarding LKJ, the cholesky version is recommended if you need speed or face numerical issues. The LKJ version is definitely cleaner.

fehiepsi · 2024-02-15T03:28:13Z

I added a comment in #31 for clarification. Thanks, @ColdTeapot273K!

fehiepsi mentioned this issue Feb 15, 2024

Use networkx instead of causalgraphicalmodels #31

Merged

fehiepsi closed this as completed Feb 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

14.31 code #20

14.31 code #20

ColdTeapot273K commented Jun 15, 2022

fehiepsi commented Jun 16, 2022

ColdTeapot273K commented Jun 16, 2022

fehiepsi commented Jun 17, 2022

fehiepsi commented Feb 15, 2024

14.31 code #20

14.31 code #20

Comments

ColdTeapot273K commented Jun 15, 2022

fehiepsi commented Jun 16, 2022

ColdTeapot273K commented Jun 16, 2022

fehiepsi commented Jun 17, 2022

fehiepsi commented Feb 15, 2024