Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

interface revision #297

Open
peisenha opened this issue Nov 25, 2019 · 4 comments
Open

interface revision #297

peisenha opened this issue Nov 25, 2019 · 4 comments
Labels
enhancement An addition or change to an existing feature. priority-low

Comments

@peisenha
Copy link

Description

We are considering to revise the respy interface and this issue serves to collect thought and use cases. At the moment, we initialize a simulate() and a crit_func() and do a host of setup operations. In particular, we create the time-consuming StateSpace class instance.

For simulation:

simulate = rp.get_simulate_func(params, options)
df = simulate(params)

For estimation:

crit_func = rp.get_crit_func(params, options, df)
crit_func(params)
@peisenha peisenha added enhancement An addition or change to an existing feature. priority-low size-M labels Nov 25, 2019
@peisenha
Copy link
Author

The purpose of the package is to serve as a computational sandbox. Some features of the current interface make this harder than it probably should be. However, I might just miss the proper workflow at this point, so any clarification welcome (@tobiasraabe ).

  • I am running a bootstrap. Each time I sample a new dataset, I need to initialize a new criterion function which entails the costly creation of the StateSpace class even throughout that remains unchanged throughout the exercise.

  • I am investigating the effect of numerical tuning parameters on the shape of the likelihood function. I iterate over different numbers of Monte Carlo draws by changing the options file. Again, I need to create the criterion function. This might be relevant for your notebook as well, @rafaelsuchy .

@janosg
Copy link
Member

janosg commented Nov 25, 2019

I think for a bootstrap it does not really matter because the setup cost is really small compared to the cost of running a bootstrap. But in general it is true, that we should save setup costs.

The reason we implemented it like this is that it reduces complexity by a lot when you always re-create everything instead taking some old model instance, determining which parts have to change and re-creating them.

Therefore, I suggest the following:

  • we first try to just reduce the setup costs before we try to reduce the number of times we incur it. If it is mainly the StateSpace creation, there are definitely ways to make it faster.
  • If this is not enough, we try to cache the most expensive functions. We should not manually check what has to be re-created, but use an existing solution like joblib memcache.
  • Only if this is still not enough I would consider re-using instances of some model class.

@peisenha
Copy link
Author

Points for our discussion on Thursday:

  • Yes, the setup costs are small for a serious bootstrap exercise but are "sizeable" during prototyping that only involves a small number of function evaluations for testing purposes.

  • The caching solution looks interesting, we might just need to look for one that does not require to write to disk depending on how large a dump of the StateSpace instance is.

  • Also, I would like us to consider/discuss in our next call if we want the respy interfaces work with dataframe that have Individual and Period as a pd.MulitIndex.

@tobiasraabe
Copy link
Member

The multiindex is already implemented in #277 and will be merged into master at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An addition or change to an existing feature. priority-low
Projects
None yet
Development

No branches or pull requests

3 participants