Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Separate code from instance-specific scripts & artifacts #88

Open
jwodder opened this issue Aug 5, 2024 · 2 comments
Open

Comments

@jwodder
Copy link
Member

jwodder commented Aug 5, 2024

Proposal: Split this repository into two repositories, one for just the dandisets-healthstatus code and one for all files specific to applying the code to the production DANDI Archive instance.

  • The code repository (which could either be this repository or a new dandisets-healthstatus-code repository) will contain:

    • code/ (with leading directory stripped)
    • setup.cfg
    • .pre-commit-config.yaml
    • .gitignore
    • .github/
    • WEBDAV.md (possibly merged into a dedicated README for the new repo)
  • The production DANDI repository (possible name: dandisets-healthstatus-dandi) will contain scripts for running the code against the production DANDI Archive instance as well as all files produced by such runs; specifically:

    • tools/
    • results/
      • This will also include the output logs for issue 72 and the event logs for issue 82.
    • README.md (autogenerated)
    • .gitattributes
    • environments.yaml file for issue 82
  • The tools/run.sh script will install the code by running pip install git+https://github.com/dandi/dandisets-healthstatus-code

    • There will be no submodules in this setup, as those are too much pain for too little gain.
    • Problem: Because this command installs from a VCS URL, then, even if the --upgrade option is passed to pip install, the code will not be reinstalled unless the package's version changes, but we don't maintain a changing version for dandisets-healthstatus
      • Possible solution 1: Always uninstall dandisets-healthstatus from the venv if it's present before running the installation command, thereby ensuring that the latest code is installed on every run (This is what the backups2datalad scripts do)
      • Possible solution 2: Use versioningit to include commit details in dandisets-healthstatus's version
  • I think it may be best/cleanest to perform this split as follows:

    • Create two new empty repositories: dandi/dandisets-healthstatus-code and dandi/dandisets-healthstatus-dandi
      • While we're at it, could we drop the "dandisets-" from the repository names? The dandi/ organization already implies that the repositories have something to do with Dandisets, and I don't like how long this repository's name is already. (dandisets-healthstatus-dandi could then become healthstatus-prod or similar.)
    • Use git-filter-repo to produce a stripped-down version of this repository containing only the code and its history, and push this to the code repository
    • Either use git-filter-repo to do likewise for the production assets or else (since the results/ history isn't all that informative) just copy tools/, results/, etc. to a new, history-less repository. Either way, the results are pushed to the production repository.
    • Update tools/run.sh for the new setup
    • Archive this repository

@yarikoptic: Your thoughts?

@jwodder
Copy link
Member Author

jwodder commented Aug 12, 2024

@yarikoptic Ping.

@yarikoptic
Copy link
Member

you have my blessing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants