In today's demo, we will be reviewing some how Flagger uses App Mesh to automatically, incrementally introduce traffic to an upgraded version of an application before promoting it to serve 100% of live traffic. We will also demonstrate what happens in a "failed" deployment scenario, whereupon failed metrics, the attempt to upgrade the application halts and rolls back.
Tools used for this demo include:
- eksctl to create the cluster.
eksctl
is the official CLI tool created by Weaveworks for creating clusters on AWS. - eksctl enable repo to install GitOps Operator tools (FluxCD, Helm Operator)
- eksctl enable profile appmesh to install:
- Kubernetes custom resources: mesh, virtual nodes and virtual services
- CRD controller: keeps the custom resources in sync with the App Mesh control plane
- Admission controller: injects the Envoy sidecar and assigns pods to App Mesh virtual nodes
- Telemetry service: Prometheus instance that collects and stores Envoy’s metrics
- Progressive delivery operator: Flagger instance that automates canary releases on top of App Mesh
- app mesh as the service mesh (comprised of the App Mesh controller, CRDs, Grafana, sidecar injector, Prometheus, installed via appmesh profile and eks-charts)
- GitOps operational method to ensure Git is the source of truth for declarative infrastructure and applications
- fluxctl to allow Flux, the GitOps operator created by Weaveworks, to automatically apply and deploy contents of your Git repository. Fluxctl is another Weaveworks OSS project that will be used for today's demo for time's sake.
- flagger to implement progressive delivery. Created by Weaveworks, Flagger is a Kubernetes operator that automates the promotion of canary deployments.
Today's demo uses a sample microservice podinfo to mimic an application for us to Canary. Version 3.1.0 is currently
deployed, and as a mechanism to demonstrate a visual change, we will be editing the image displayed on the page as we edit
the podinfo
image tag.
You can get the ingress public address by running:
export URL="http://$(kubectl -n demo get svc/appmesh-gateway -ojson | jq -r ".status.loadBalancer.ingress[].hostname")"
echo $URL
To demonstrate an automated Canary promotion, we will be making a very small commit that comprises of updating the image tag
to 3.1.1
as well as the env
variable PODINFO_UI_LOGO
to instead display https://eks.handson.flagger.dev/cuddle_bunny.gif
.
A quick note here, this step of updating the image tag will typically be handled by Flux. Flux is able to monitor the image repository where you push your application images to, and can make this commit on your behalf. You can specify whether or not you want Flux to automate the release of every update using regular expressions in a Flux annotation. We won't be demoing this but it is possible for delivery to be automated safely once a development team has successfully built their application's image.
To watch the Canary progress, we will monitor the canaries via:
kubectl -n demo get canaries --watch
and the Flagger logs via:
kubectl -n appmesh-system logs deployment/flagger -f | jq .msg
We expect this upgrade to successfully progress with incremented traffic weights as specified in canary.yaml
. At 50%, we will
see the promotion begin to meet the HPA minimum (2 replicas, specified in hpa.yaml
).
To demonstrate a faulty release scenario, we will be injecting some HTTP 500
errors during the promotion of the Canary
to trigger an automated rollback.
We will edit the image tag to 3.1.2
.
To fabricate some failed requests, we'll first exec
into the flagger-loadtester
pod:
kubectl -n demo exec -it $(kubectl -n demo get pods -o name | grep -m1 flagger-loadtester | cut -d'/' -f 2) bash
We'll use hey
to generate some HTTP 500 requests and cause some delays:
hey -z 1m -c 5 -q 5 http://podinfo-canary.demo:9898/status/500 && \
hey -z 1m -c 5 -q 5 http://podinfo-canary.demo:9898/delay/1
To watch the Canary progress, we will monitor the canaries via:
kubectl -n demo get canaries --watch
and the Flagger logs via:
kubectl -n appmesh-system logs deployment/flagger -f | jq .msg
In this example, we'll expect to see the Canary progress until the errors take place. Once the threshold for errors is met,
we expect the Canary to be halted until the success rate increases to 99% (set in canary.yaml
). We then expect the
promotion to halt, and for all traffic to be routed back to the stable version of your application.
If you would like to run through this demo on your own, please take a look at Workshop 5, Accelerating the Software Development Lifecycle. You can use your own AWS account to work through this demo.