Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework GitHub Actions workflows to build packages --> test packages #584

Open
ScottTodd opened this issue Nov 21, 2024 · 5 comments
Open
Assignees

Comments

@ScottTodd
Copy link
Member

These workflows all currently build shortfin from source, duplicating all the boilerplate to fetch dependencies in some carefully balanced order:

For workflows that run on pull_request and push triggers, we can add a build_dev_packages job similar to https://github.com/nod-ai/shark-ai/blob/main/.github/workflows/build_packages.yml that builds the packages and then have those workflows install artifacts from that job. For workflows that run on schedule, we can either do the same thing, or we can use the already built nightly packages (docs: https://github.com/nod-ai/shark-ai/blob/main/docs/nightly_releases.md).

In both cases, the complexity of package building will be isolated to a few package-oriented workflows and we'll gain confidence that the test jobs are compatible with our releases, so users will be able to use them without needing to build from source either.

Once we have something working, we can optimize the package build to improve CI turnaround times:

  • cache pip dependencies
  • cache CMake builds (or the entire build - see what IREE does)
  • cache Dockerfiles
  • skip the tracy build variant:
    function build_shortfin() {
    export SHORTFIN_ENABLE_TRACING=ON
    python -m pip wheel --disable-pip-version-check -v -w "${OUTPUT_DIR}" "${REPO_ROOT}/shortfin"
    }
    ENABLE_TRACY = get_env_boolean("SHORTFIN_ENABLE_TRACING", False)

    shark-ai/shortfin/setup.py

    Lines 260 to 263 in 06599e9

    try:
    self.build_default_configuration()
    if ENABLE_TRACY:
    self.build_tracy_configuration()

See https://github.com/iree-org/iree/blob/main/.github/workflows/pkgci.yml for the shape of this sort of setup in IREE.

@ScottTodd
Copy link
Member Author

Proof of concept migration of one workflow: #625. This added 1 minute to total workflow time but has a few scaling benefits. Going to let that sit for a bit and run some more experiments.

The main time sink is installing Python packages (even if already downloaded/cached). Workflows that use persistent self-hosted runners currently don't use venvs, so they risk having packages left over from previous jobs and either installing conflicting versions of packages or failing to install the requested versions entirely. The new setup_venv.py code (forked from IREE) installs the dev packages and requirements sequentially, but we might be able to optimize that a bit while still retaining predictability.

@stellaraccident
Copy link
Contributor

You may want to look at using uv as a pip replacement when latency is a concern. I dislike forked tool flows, by it seems like a lot of folks are having a good experience there.

@ScottTodd
Copy link
Member Author

Recipes for using uv: https://github.com/astral-sh/uv?tab=readme-ov-file#a-pip-compatible-interface . Definitely worth trying out.

@marbre
Copy link
Collaborator

marbre commented Nov 28, 2024

If you want to build a package you want to use uv build and not uv pip. The equivalent for python -m pip wheel -v -w wheeldir . would be uv build --wheel -v -o wheeldir .. I would say uv is definitely an alternative, especially as uv venv is a really nice alternative. Furthermore, uv allows to install different Python versions and therefore is also a replacement for pyenv. However, I also faced some issues in the past but it's for sure worth to give it another try.

@ScottTodd
Copy link
Member Author

The bottleneck I'd like to optimize is the 2m30s spent installing packages (including deps), not the 1m30s building the shortfin/sharktank/shark-ai packages. See logs at https://github.com/nod-ai/shark-ai/actions/runs/12059301876/job/33628235219?pr=625#step:5:35 :

Wed, 27 Nov 2024 23:02:34 GMT
Installing collected packages: mpmath, typing-extensions, sympy, networkx, MarkupSafe, fsspec, filelock, jinja2, torch
Wed, 27 Nov 2024 23:03:02 GMT
Successfully installed MarkupSafe-2.1.5 filelock-3.13.1 fsspec-2024.6.1 jinja2-3.1.4 mpmath-1.3.0 networkx-3.3 sympy-1.13.1 torch-2.3.0+cpu typing-extensions-4.12.2
...
Wed, 27 Nov 2024 23:03:13 GMT
Installing collected packages: pytz, xxhash, urllib3, tzdata, tqdm, sniffio, six, safetensors, regex, pyyaml, pydantic-core, pyarrow, propcache, packaging, numpy, multidict, idna, h11, frozenlist, dill, click, charset-normalizer, certifi, attrs, annotated-types, aiohappyeyeballs, yarl, uvicorn, requests, python-dateutil, pydantic, multiprocess, iree-base-runtime, iree-base-compiler, gguf, anyio, aiosignal, starlette, pandas, iree-turbine, huggingface-hub, aiohttp, tokenizers, fastapi, transformers, datasets
Wed, 27 Nov 2024 23:04:07 GMT
Successfully installed aiohappyeyeballs-2.4.3 aiohttp-3.11.8 aiosignal-1.3.1 annotated-types-0.7.0 anyio-4.6.2.post1 attrs-24.2.0 certifi-2024.8.30 charset-normalizer-3.4.0 click-8.1.7 datasets-3.0.1 dill-0.3.8 fastapi-0.112.2 frozenlist-1.5.0 gguf-0.10.0 h11-0.14.0 huggingface-hub-0.22.2 idna-3.10 iree-base-compiler-3.0.0 iree-base-runtime-3.0.0 iree-turbine-3.0.0 multidict-6.1.0 multiprocess-0.70.16 numpy-1.26.4 packaging-24.2 pandas-2.2.3 propcache-0.2.0 pyarrow-18.1.0 pydantic-2.10.2 pydantic-core-2.27.1 python-dateutil-2.9.0.post0 pytz-2024.2 pyyaml-6.0.2 regex-2024.11.6 requests-2.32.3 safetensors-0.4.5 six-1.16.0 sniffio-1.3.1 starlette-0.38.6 tokenizers-0.19.1 tqdm-4.67.1 transformers-4.40.0 tzdata-2024.2 urllib3-2.2.3 uvicorn-0.30.6 xxhash-3.5.0 yarl-1.18.0

The build steps can be optimized too, but 1m30s on a standard runner with a (very low) 40% cache hit rate is pretty respectable already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants