Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add default BioC_mirror for CI=TRUE env #187

Open
wants to merge 13 commits into
base: devel
Choose a base branch
from
Open

Conversation

LiNk-NY
Copy link
Contributor

@LiNk-NY LiNk-NY commented Mar 4, 2024

Hi Martin, @mtmorgan
Here is a draft PR to re-direct CI=TRUE environments to low-cost egress services via the BioC_mirror option.

cc: @almahmoud @vjcitn

@LiNk-NY LiNk-NY requested a review from mtmorgan March 4, 2024 20:45
Copy link
Collaborator

@mtmorgan mtmorgan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT: Maybe I'm overthinking the comments below; the ci-mirror will just be a simple sync of the default mirror, not trying to deliver anything special for the specific CI platforms.... sorry for going off on a tangent...

This seems to commit Bioc to maintaining a complete mirror, and therefore all repositories -- software, annotation, experiment, workflow, book -- for each version and CI platform.

Also, if I understand, the change replaces the default mirror (https:/bioconductor.org), so that if a package is, for whatever reason, not in the container repository, it will not be found in the default repository.

The strategy adopted for supporting binary containers is different -- a single repository for each binary platform, placed at higher priority that the default mirror.

Is there a reason not to generalize the container approach, with a url like CI_BASE_URL <- "https://bioconductor.org/packages/<BIOC_VERSION>/ci_packages/<CI_PLATFORM>" (or https://ci-mirror.bioconductor.org/<BIOC_VERSION>/<CI_PLATFORM>)? The first formulation might be preferred for consistency with the binary package solution.

The single repository at the end of this URL would include all packages available for BIOC_VERSION and CI_PLATFORM (rather than separate software, annotation, etc., remembering that package names have to be globally unique anyway). Packages not (yet) available for a CI platform would be found in the default mirror.

The patch requires documentation & unit tests to the extent that these are possible.

Refactoring . env_opt_lgl() might be taken to a separate pull request, and applied systematically?

R/repositories.R Outdated Show resolved Hide resolved
R/repositories.R Outdated Show resolved Hide resolved
R/repositories.R Outdated Show resolved Hide resolved
R/utilities.R Outdated Show resolved Hide resolved
@almahmoud
Copy link

almahmoud commented Mar 5, 2024

I think a complete mirror (EDIT: for current release and devel only, previous versions are always routing to an egress-free bucket anyway) is the goal, first by having a constant syncing mechanism, then ideally by having the BBS machine push to all the bucket destinations itself, with the idea of potentially also routing heavy traffic to this mirror after hitting a certain threshold in a certain time period, to avoid excess egress fees from a single source, which is the context in which this came up again.

With that being said, I agree that completely replacing bioconductor.org is probably not the best way to go. The idea was very much inspired by the container-binaries reroute in BiocManager, and doing it similarly sounds great if that can be done for all OSes and source packages. The implementation would be slightly different as it won't be per platform per se, so can't rely on an environment variable identifying the platform directly, so would have to start by detecting CI=TRUE and then still routing based on the OS or from source, as it would on any other unknown system, which I think could probably be done by generating the CI_PLATFORM portion programmatically, if I'm understanding things correctly.
If that method can also be done for source packages that would be amazing, and in line with the original rough idea of replicating the container-binaries reroute for an egress-free mirror, to have the mirror supersede where possible while maintaining the ability to default to original bioconductor.org if the mirror is down or not up-to-date yet for that particular package.

^^All of the above with the disclaimer that I am very much a youngling when it comes to understanding BiocManager and the BBS/Bioc ecosystem, so please take all my ideas/opinions with a kg of salt, and I am of course very open to be corrected!

@vjcitn
Copy link

vjcitn commented Mar 5, 2024

thanks for all the discussion here. I am very concerned about the complexity. I wonder whether biocmanager is becoming overloaded with functions that do not address its main concerns of organizing an installation for a user. it would be good to think about factoring repository- and version- and platform-oriented calculations away from "processing user request to valid installation" calculations. this could be a basis for well-designed tests?

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Mar 5, 2024

Thanks for the review Martin. I will work on updates.

I think setting a BioC_mirror for CI environments would be sufficient. The mirror would host both source and binary packages for all platforms (which we already produce) including https://ci-mirror.bioconductor.org/packages/X.YY/container-binaries/bioconductor_docker which would be the preferred setup method for CI/CD platforms. It would only be a matter of rsycing the assets over to the new URL.

@mtmorgan
Copy link
Collaborator

mtmorgan commented Mar 5, 2024

I think setting a BioC_mirror for CI environments would be sufficient. The mirror would host both source and binary packages for all platforms (which we already produce) including https://ci-mirror.bioconductor.org/packages/X.YY/container-binaries/bioconductor_docker which would be the preferred setup method for CI/CD platforms. It would only be a matter of rsycing the assets over to the new URL.

Good point. Does the current patch actually redirect binary queries to ci-mirror... or is the binary URL hard-coded? I'm not sure (maybe it does...) that the usual mirroring mechanism (rsync) supports the path to binaries (there's some authentication step involved, and it might restrict what can be synced...). Also not sure that one would want the binary URL to be built on the mirror, since mirrors will often not contain the binary repository...

- fallback to bioconductor.org
- update docs
- read from config.yaml
- find CI in institution value
@almahmoud
Copy link

As a first pass, I've set up a tentative configuration for the ci-mirror.bioconductor.org endpoint where /packages/[3.18 | release | 3.19 | devel] get rerouted to the egress-free bucket, while any other path defaults back to bioconductor.org

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Mar 6, 2024

Good point. Does the current patch actually redirect binary queries to ci-mirror... or is the binary URL hard-coded? I'm not sure (maybe it does...) that the usual mirroring mechanism (rsync) supports the path to binaries (there's some authentication step involved, and it might restrict what can be synced...).

The current patch checks for a valid /container-binaries/ URL. If not found, it will fallback to bioconductor.org.

Also not sure that one would want the binary URL to be built on the mirror, since mirrors will often not contain the binary repository...

Yes, they don't often contain the binary repository but now they can

@hpages
Copy link
Contributor

hpages commented Mar 6, 2024

Where is .BIOCONDUCTOR_CI_MIRROR defined? Unfortunately .url_exists() still "works" (and returns FALSE) when the variable passed to it isn't defined.

@LiNk-NY
Copy link
Contributor Author

LiNk-NY commented Mar 6, 2024

Where is .BIOCONDUCTOR_CI_MIRROR defined? Unfortunately .url_exists() still "works" (and returns FALSE) when the variable passed to it isn't defined.

Thanks, that bit of code I just removed.

@LiNk-NY LiNk-NY marked this pull request as ready for review March 6, 2024 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants