Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaging and folder structure #12

Open
bobturneruk opened this issue Aug 12, 2021 · 4 comments
Open

Packaging and folder structure #12

bobturneruk opened this issue Aug 12, 2021 · 4 comments

Comments

@bobturneruk
Copy link
Contributor

I suggest that this repo is organised as follows:

root

  • experiments/ (runs to refer to in a paper)
  • scenarios/ (specific examples)
  • causcumber/ (the generalisable part of the code)
  • setup.py (incorporates requirements)

Then to work on the project one would (from the repo root):

pip install -e .

which would install causcumber in an editable mode, allowing the generalisable parts to be imported.

This structure would facilitate pushing to PyPI.

@bobturneruk
Copy link
Contributor Author

R dependencies (

def _install_r_packages(package_names):
) may need special consideration

@jmafoster1
Copy link
Contributor

This seems sensible. In my experience, Github can get a bit annoying when you start generating gigabytes of experimental data, so maybe it would be best to keep that local for now, at least until we have the final datasets. Or it might be better to use something like ORDA and just post the lot up there when we're done.

The R dependencies are a bit of a nightmare. I had an issue with that at the weekend when I was working on my laptop. Behave captures outputs by default, so I had no idea it was waiting for me to give it permission to install the R dependencies. We only use it because Dagitty is so much faster than doWhy with the causal estimates, so it may be worth looking into the algorithm Daggity uses and reimplementing in Python, depending on how difficult/time consuming that would be.

@bobturneruk
Copy link
Contributor Author

Yeah we don't want GBs data in here. Another repo may be a temporary solution. Also git-annex and DVC should maybe be explored.

@jmafoster1
Copy link
Contributor

DVC looks quite good, especially if we can hook it up to Google docs somehow. I think much of our "big data" will come from Bessemer, so it'd be nice to have a more efficient way of getting the data off there without having to zip it, scp it, unzip it, and then commit it somewhere.

I was also saying to Andy earlier, that I think we should keep our academic evaluation separate from the tool, so that it's simpler and more streamlined for people who subsequently want to download and actually use it to test their own models. I'm still not entirely sure what the "tool" will end up being. The main causecumber contribution so far seems to be an aggregation and application of different existing techniques.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants