Skip to content

[For Maintainers] Notes on `dev` ‐ `main` merges

Jan Wassenberg edited this page Apr 6, 2024 · 6 revisions

Procedure

All project maintainers are trusted to unilaterally do dev -> main merges as they see fit. The recommended process is through the git CLI (typically preferred over a PR since the latter introduces an extra commit. The procedure is pretty simple:

  1. Make sure both dev and main are up-to-date.

  2. With the dev branch checked out, make sure you are able to run at a minimum: a) Run a 2B IT model and produce a correct generation for some standard inputs "hi how are you?", "tell me about places to visit in [geographic location]", "write code to do [simple task]" and b) I usually try at least one random new interaction that I haven't tested recently (e.g. a different prompt, different commands.

Beyond that, any additional testing e.g. with 7B IT is of course beneficial, but the main goal of this final manual approval gate is to sanity check end-to-end behavior before it impacts everyday users checking out from main.

  1. With the main branch checked out, perform git merge dev

  2. git push to update the github repo.

Goals / Rationale

The purpose of using automation for PR->dev merges and a manual release gate for dev->main is meant to balance two forms of availability:

  • Don't block - Minimal blocking / friction for PR, take advantage of CI / automation for aggressively merging PRs at a relatively high velocity / low overhead, minimize blocking on the availability of maintainers.
  • Don't break - Most end users probably don't care about any individual PRs and would be more negatively affected if functionality either broke or changed in unexpected ways. We also want to avoid quasi-pager emergencies of fixing breaking changes or rolling back subtly breaking commits over the weekend/after hours.

This dual track between fast iteration on dev and manual gating on main, gets us most benefits of both continuous automated release as well as release manager owned manual release process:

  • For power users who want to track (eg bindings maintainers) the latest PR, there's little cost to this as they can continuously track dev.
  • Non-power users probably only notice if something is broken, main provides a buffer between fast iteration and their UX.

Possibilities for More Automation and Limitations

Can we automate more or all of this?

I think the boundaries of automation could be pushed further. The main blocker is probably that compared to other CI testing, the model artifacts are fairly large and the compute overhead is fairly high. These are not insurmountable though and it would be beneficial to be able to do a "hi how are you?"-type test as part of CI.

However, the interaction surface of LLM apps are not 1:1 transferrable with standard applications. The surface area of interaction is a lot more amorphous and there are corner cases that, if-tested have negative externalities such as inordinately increasing the CI time.

Two small examples: 1) code/model artifact syncing - if we update the model artifact (eg the recent MHA -> MQA) including in the CI, we could easily miss that this impacts users with either new version of the code + old version of the artifact or an old version of the code + new version of the artifact. Running into this in the manual check leads us to update docs + error messages to minimize the impact. 2) While CI is probably limited to short generations to avoid increasing overhead, some bugs may only be visible after longer generations (generation quality degredation, fence-post errors).

Although these two issues could benefit from at least be partial mitigation, LLM testing has unique concerns that make comprehensive automated testing a challenge and at the same time gating dev/main has relatively few costs. Consider also an idealized version of automation where evals are run end-to-end, even in that idealized scenario we run into LLM evals being an open research problem which are complemented by manual out-of-distribution checking.

Releases

We've cut releases relatively informally here https://github.com/google/gemma.cpp/releases With an OSS project like this releases are primarily symbolic, but can be useful to cut:

  • To demarcate substantial updates/change to functionality, features and/or performance
  • To engage the community + provide public recognition for contributions

Github has some nice automation around release notes - note the What's Changed and New Contributors sections in 0.1.1 are automated suggestions.

We're not targeting a particular cadence, with the current velocity of the project it seems like ~ once every 1-2 months feels like enough of a change to cut a minor version. This could slow down (eg quarterly/bi-annually if development is at a steady state) or speed up if there's a particular major feature release planned (eg a new model variation supported or major refactoring).

Process

  • Go to https://github.com/google/gemma.cpp/releases
  • "Draft New Release" button
  • Choose the next version number from the "Choose a Tag" drop down
  • Use the same version number for the "Release Title" text box
  • Enter a few high-level bullet points in the description box
  • Click "Generate release notes" which fills in the callouts to PRs, contributors :)
  • At the button either "Publish Release" or "Save draft"