Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adaptive shrinkage of state-specific elevation coefficients #3

Open
dhimmel opened this issue Oct 30, 2016 · 6 comments
Open

Adaptive shrinkage of state-specific elevation coefficients #3

dhimmel opened this issue Oct 30, 2016 · 6 comments

Comments

@dhimmel
Copy link
Owner

dhimmel commented Oct 30, 2016

I was talking with @stephens999 at the Moore Investigator Symposium. His group has developed an "adaptive shrinkage" procedure. I wanted to get a feel for this method and therefore am exploring how it affects Figure 5 of our lung cancer study.

The following code takes the state-specific elevation coefficients and performs adaptive shrinkage:

# Install ashr
devtools::install_github('stephens999/ashr', ref='293847b9a413e65291d40784d76004b4b7e07b0f')

# Read state model information used in https://doi.org/10.7717/peerj.705/fig-5
url = 'https://github.com/dhimmel/elevcan/raw/7aed9f29d2371eb4918f337a138608e6b6d9e311/output/figdata/lung-state-strat.txt'
state_df = readr::read_tsv(url)
state_df = dplyr::filter(state_df, state != 'Meta')
state_df$n_counties = tidyr::extract_numeric(state_df$state_n)

# Perform multiple tests using adaptive shrinkage
ash_result = ashr::ash(
  betahat = state_df$elevation.coef,
  sebetahat = state_df$elevation.se,
  df = state_df$n_counties - 1,
  mode = 'estimate')

# Plot coefficients with and without adaptive shrinkage
plot_df = dplyr::bind_rows(
  dplyr::transmute(state_df,
    state,
    coef = elevation.coef,
    se = elevation.se,
    shrinkage=FALSE),
  dplyr::data_frame(
    state = state_df$state, 
    coef = ashr::get_pm(a = ash_result),
    se = ashr::get_psd(a = ash_result),
    shrinkage=TRUE)
)

states = dplyr::arrange(state_df, elevation.coef)$state

gg = ggplot2::ggplot(plot_df, ggplot2::aes(coef, state, color=shrinkage)) +
  ggplot2::geom_vline(xintercept = 0, color='gray', linetype='dashed', size=0.75) +
  ggplot2::geom_errorbarh(
    ggplot2::aes(xmax = coef + se, xmin = coef - se),
    alpha=0.45, size=1.5, height = .5) +
  ggplot2::xlab(expression(beta[elevation])) +
  ggplot2::scale_y_discrete(limits = states) +
  ggplot2::theme_bw()
ggplot2::ggsave('~/Desktop/state-shrinkage.png', gg, width = 5, height = 3, dpi = 300)

The code creates the following visualization, which shows the standard errors (supposedly) for the unmodified and shrunken coefficients:

state-shrinkage

@stephens999, it looks like the standard errors are smaller after shrinkage. Does that make sense? Any insights you have on what has happened and whether this is productive would be appreciated.

@stephens999
Copy link

stephens999 commented Oct 31, 2016

Yes, standard errors (actually posterior standard deviations) are smaller after shrinkage. This is normal because you are combining information across all states in estimating the value for each state. Because you are using more information you usually get less uncertainty than if you looked at just the data for that state alone.

Of course the combining of information across states involves making some modelling assumptions. ( I think some assumptions are probably inevitably required to do this.) In this case we assume that the distribution of effects across states is unimodal; I'm not worried about making that assumption here.

One additional issue is that the method ignores any errors in the estimated unimodal distribution. (This is a general criticism of empirical Bayes methods). This means that the standard deviations are going to be a bit smaller than they should be. With large sample sizes this is likely not a very big issue, but with only about 10 samples as you have here it is worth bearing in mind.

@stephens999
Copy link

stephens999 commented Oct 31, 2016

BTW one thing this analysis suggests that maybe isn't obvious from the non shrunk analysis is that the NM coefficient is likely smaller (more negative) than the other states.

Not sure if that is interesting to you, or has some possible explanations?

@dhimmel
Copy link
Owner Author

dhimmel commented Oct 31, 2016

BTW one thing this analysis suggests that maybe isn't obvious from the non shrunk analysis is that the NM coefficient is likely smaller (more negative) than the other states.

That is very interesting and a great example of where adaptive shrinkage provides a unique insight. Will note this finding for future investigation (CC @ksimeono).

Yes, standard errors (actually posterior standard deviations) are smaller after shrinkage. This is normal because you are combining information across all states in estimating the value for each state.

@stephens999: Cool makes sense. So the confidence intervals should also be smaller? How do you recommend getting confidence intervals out of the ash_result? In general, I think confidence intervals are more informative for visualization than standard errors.

@stephens999
Copy link

stephens999 commented Nov 1, 2016

Current interface is ashCI

May change this in near future..

@dhimmel
Copy link
Owner Author

dhimmel commented Nov 3, 2016

When I run:

ashr::ashci(a = ash_result)

I get the error:

Error in extract_data(data, i) : 
  extracting data not supported for non-constant likelihoods

Not a huge deal for me, just wanted to let you know.

@stephens999
Copy link

stephens999 commented Nov 3, 2016

ok, that's because you have a different df for each observation - not something we come across in our applications so much, so haven't focussed on.

Good to know for the future though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants