Wasm binary served from kuadrant operator workload #686

eguzki · 2024-06-03T22:17:52Z

What

part of #325
Follow-up work after discarding #593

The rate limiting wasm binary is embedded in the kuadrant operator at build time. The kuadrant operator exposes the wasm binary in port 8082 (as a configuration parameter) at the endpoint /kuadrant-wasm-shim. The operator's build process requires the wasm binary to be available locally. Then, the SHA256 checksum of the wasm binary is computed and stored internally in a Golang variable. In order to have the wasm binary locally available at build time, there is a new makefile target to fetch the specified version of the wasm binary from the Github release assets.

make wasm-shim WASM_SHIM_VERSION=vX.Z.Y

The kuadrant operator exposes the wasm binary deploying a new kubernetes service called kuadrant-operator-controller-manager-wasm-shim-service. It looks like this (some fields were removed for simplicity)

apiVersion: v1
kind: Service
metadata:
  labels:
    app: kuadrant
    control-plane: controller-manager
  name: kuadrant-operator-controller-manager-wasm-shim-service
  namespace: kuadrant-system
spec:
  ports:
  - name: wasm-shim
    port: 8082
    protocol: TCP
    targetPort: wasm-shim
  selector:
    app: kuadrant
    control-plane: controller-manager
  sessionAffinity: None
  type: ClusterIP

The wasm module integrates with the gateway in the data plane via
the Wasm Network filter.
The source code of the compiled Wasm binaries is hosted at
Kuadrant's Wasm-Shim project.

Currently, at runtime, the istio control plane downloads an oci wasm image. Usually from cluster external image repo like quay.io. This clearly opens a risky door to inject malicious code.

This architecture enables so-called offline or disconnected installs,
which allow having the entire cluster disconnected from the internet,
at least regarding the Wasm module.

Disconnected install is itself a full feature and engineering did not tested that yet.

How

Istio

The following sequence diagram shows the workflow when Envoy is managed by Istio

sequenceDiagram
    autonumber
    box transparent Kubernetes cluster
    participant K as Kuadrant Operator
    participant I as Istio WasmPlugin
    participant E as Envoy
    end
    K->>I: http://kuadrant-operator-address:8082, sha256
    I->>K: Fetch Wasm binary, verify sha256 checksum
    I->>E: Push Wasm binary
    I->>E: Setup Wasm filter

Verification Steps

Setup the environment:

make local-setup

Request an instance of Kuadrant:

kubectl -n kuadrant-system apply -f - <<EOF
apiVersion: kuadrant.io/v1beta1
kind: Kuadrant
metadata:
  name: kuadrant
spec: {}
EOF

Deploy toystore

kubectl apply -f examples/toystore/toystore.yaml

Create a HTTPRoute to route traffic to the service via Istio Ingress Gateway:

kubectl apply -f - <<EOF
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: toystore
spec:
  parentRefs:
  - name: istio-ingressgateway
    namespace: istio-system
  hostnames:
  - api.toystore.com
  rules:
  - matches:
    - method: GET
      path:
        type: PathPrefix
        value: "/toys"
    backendRefs:
    - name: toystore
      port: 80
  - matches: # it has to be a separate HTTPRouteRule so we do not rate limit other endpoints
    - method: POST
      path:
        type: Exact
        value: "/toys"
    backendRefs:
    - name: toystore
      port: 80
EOF

Export the gateway hostname and port:

export INGRESS_HOST=$(kubectl get gtw istio-ingressgateway -n istio-system -o jsonpath='{.status.addresses[0].value}')
export INGRESS_PORT=$(kubectl get gtw istio-ingressgateway -n istio-system -o jsonpath='{.spec.listeners[?(@.name=="http")].port}')
export GATEWAY_URL=$INGRESS_HOST:$INGRESS_PORT

Verify the route works:

curl -H 'Host: api.toystore.com' http://$GATEWAY_URL/toys -i
# HTTP/1.1 200 OK

Enforce rate limiting on requests to the Toy Store API

kubectl apply -f - <<EOF
apiVersion: kuadrant.io/v1beta2
kind: RateLimitPolicy
metadata:
  name: toystore
spec:
  targetRef:
    group: gateway.networking.k8s.io
    kind: HTTPRoute
    name: toystore
  limits:
    "create-toy":
      rates:
      - limit: 5
        duration: 10
        unit: second
      routeSelectors:
      - matches: # selects the 2nd HTTPRouteRule of the targeted route
        - method: POST
          path:
            type: Exact
            value: "/toys"
EOF

Check wasm plugin has been created and it contains the url of local service of the kuadrant operator at port 8082 and endpoint /kuadrant-wasm-shim

kubectl get wasmplugin kuadrant-istio-ingressgateway -n istio-system -o jsonpath="{.spec.url}"

It should return

http://kuadrant-operator-controller-manager-wasm-shim-service.kuadrant-system.svc.cluster.local:8082/kuadrant-wasm-shim

Note that the url follows the kubernetes format http://<service-name>.<namespace>.svc.cluster.local:<port>/<endpoint>.

Check wasm plugin has been created and it contains the sha256 checksum of the Wasm binary served by the operator

kubectl get wasmplugin kuadrant-istio-ingressgateway -n istio-system -o jsonpath="{.spec.sha256}"

It should return some sha256 value (may not be the same)

12879567faee3d2625a4998f6a4e622ded01163133bdef2b539bfa62f921cdd8

The sha256 checksum value should match the one shown in the operator logs

kubectl logs deployment/kuadrant-operator-controller-manager -n kuadrant-system | grep sha256

which gives

2024-06-12T08:54:18Z	INFO	kuadrant-operator	wasm-shim	{"sha256": "12879567faee3d2625a4998f6a4e622ded01163133bdef2b539bfa62f921cdd8"}

Run requests (5 out of 10 allowed)

while :; do curl --write-out '%{http_code}\n' --silent --output /dev/null -H 'Host: api.toystore.com' http://$GATEWAY_URL/toys -X POST | grep -E --color "\b(429)\b|$"; sleep 1; done

Let's verify upgrade of the Wasm binary version
There is no need to stop the http client... it should not be affected and rate limiting should work all the time

First delete cached wasm binary

rm kuadrant-wasm-shim

Undeploy running kuadrant operator

kubectl delete deployment kuadrant-operator-controller-manager -n kuadrant-system

Download the new wasm version v0.4.0-alpha.4

make wasm-shim WASM_SHIM_VERSION=v0.4.0-alpha.4

It should report the new sha256 checksum

Downloading kuadrant-wasm-shim@v0.4.0-alpha.4 from https://api.github.com/repos/Kuadrant/wasm-shim/releases/assets/173313660
sha256sum /home/eguzki/git/kuadrant/kuadrant-operator/kuadrant-wasm-shim
e0c43b4759a86d97461377bf55c71d4a6366f709a245420e34ab12928a3e101e  /home/eguzki/git/kuadrant/kuadrant-operator/kuadrant-wasm-shim

The next command will build a new operator image with the new wasm binary and deploy it

make local-deploy

Check wasm plugin has been updated (reconciled from the new operator) and it contains the new sha256 checksum of the Wasm binary served by the operator

kubectl get wasmplugin kuadrant-istio-ingressgateway -n istio-system -o jsonpath="{.spec.sha256}"

It should return some sha256 value (may not be the same)

e0c43b4759a86d97461377bf55c71d4a6366f709a245420e34ab12928a3e101e

The sha256 checksum value should match the one shown in the operator logs

kubectl logs deployment/kuadrant-operator-controller-manager -n kuadrant-system | grep sha256

which gives

2024-06-12T09:02:39Z	INFO	kuadrant-operator	wasm-shim	{"sha256": "e0c43b4759a86d97461377bf55c71d4a6366f709a245420e34ab12928a3e101e"}

Note that before the upgrade, the sha256 was 12879567faee3d2625a4998f6a4e622ded01163133bdef2b539bfa62f921cdd8

Note the HTTP client keeps being rate limited and did not notice the upgrade.

eguzki · 2024-06-04T08:20:35Z

.github/workflows/build-images-base.yaml

@@ -23,10 +23,6 @@ on:
        description: DNS Operator bundle version
        default: latest
        type: string
-      wasmShimVersion:


heads up @didierofrivia

The wasm shim is no longer part of the operator bundle. Instead, it is added as part of the kuadrant operator image build process.

codecov · 2024-06-04T08:23:42Z

Codecov Report

Attention: Patch coverage is 73.91304% with 6 lines in your changes missing coverage. Please review.

Project coverage is 82.92%. Comparing base (ece13e8) to head (1168e54).
Report is 120 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #686      +/-   ##
==========================================
+ Coverage   80.20%   82.92%   +2.72%     
==========================================
  Files          64       77      +13     
  Lines        4492     5776    +1284     
==========================================
+ Hits         3603     4790    +1187     
- Misses        600      653      +53     
- Partials      289      333      +44

Flag	Coverage Δ
bare-k8s-integration	`4.56% <0.00%> (?)`
controllers-integration	`72.58% <73.91%> (?)`
gatewayapi-integration	`11.11% <0.00%> (?)`
integration	`?`
istio-integration	`56.23% <73.91%> (?)`
unit	`32.42% <0.00%> (+2.39%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
api/v1beta1 (u)	`71.42% <ø> (ø)`
api/v1beta2 (u)	`93.58% <100.00%> (+2.16%)`	⬆️
pkg/common (u)	`88.13% <ø> (-0.70%)`	⬇️
pkg/istio (u)	`72.39% <ø> (-1.53%)`	⬇️
pkg/log (u)	`94.73% <ø> (ø)`
pkg/reconcilers (u)	`∅ <ø> (∅)`
pkg/rlptools (u)	`82.53% <ø> (+3.08%)`	⬆️
controllers (i)	`82.18% <80.49%> (+5.38%)`	⬆️

Files	Coverage Δ
...llers/rate_limiting_istio_wasmplugin_controller.go	`81.09% <100.00%> (ø)`
pkg/rlptools/wasm/server_utils.go	`100.00% <100.00%> (ø)`
pkg/rlptools/wasm/utils.go	`86.66% <ø> (ø)`
pkg/istio/mutators.go	`41.66% <0.00%> (ø)`

... and 32 files with indirect coverage changes

guicassolato · 2024-06-10T15:23:12Z

Makefile

@@ -305,15 +295,26 @@ test-unit: clean-cov generate fmt vet ## Run Unit tests.

 ##@ Build

+WASM_SHIM = $(PROJECT_PATH)/kuadrant-ratelimit-wasm


Can we use a more generic name for this, that perhaps continues to work if we, say, add something like ext authz to the functions performed by the component as well?

Suggested change

WASM_SHIM = $(PROJECT_PATH)/kuadrant-ratelimit-wasm

WASM_SHIM = $(PROJECT_PATH)/kuadrant-wasm-shim

If we are planning to have just one wasm shim to do everything, it makes sense @guicassolato suggestion... however, if the name is meant to only rate limiting purposes, it's OK.

I'd go for the more generic name too... unless we need to discriminate one day, which I doubt, this is the most portable option.

I have no idea what's better between one wasm shim per function (RL, auth) or a single one for all. I can see pros and cons for both, although today I'd probably be more inclined to a single one, I think.

My initial idea was that if ext authz was being done by wasm, it would be called kuadrant-authz-wasm. And would be a different wasm binary. Discrimination comes from the fact that they are essentially speaking different languages. While rate limiting uses RLS, ext auth uses external authorization gRPC protocol. So configuration of each wasm module, I pre-asume, would be different. Not even speaking about potentially different release stream.

But if you prefer to go for a generic name for now, I am happy with it. It can always be changed in the future.

I agree that RLS and ext_authz are indeed what make the two wasm-shims (two functions of a single wasm-shim) different one another.

On the other hand, other than that, arguably the two wasm-shims are practically identical:

both need to decide whether a request matches;

both need to evaluate well-known attributes;

both perform a grpc call to a service, basically saying "should I let traffic go through? decide based on this payload.";

both expect a boolean response, maybe with some metadata that typically become HTTP headers.

Moreover, if the auth layer needs to propagate data to the RL layer, with one wasm-shim, that can be done "over-the-wire", while with two, one wasm-shim needs to inject Envoy Dynamic Metadata so the other one can retrieve it.

as a design pattern, I always pick one module does one thing. For multiple reasons. But this is a discussion we do not need to have now.

one module does one thing

Not wrong. Neither we should repeat ourselves, another design principle 😜

Maybe this is one of those cases where following all the "rules" we have in life becomes impractical, like "The early bird catches the worm" but also "Good things come to those who wait."

didierofrivia · 2024-06-10T15:45:59Z

.gitignore

@@ -31,5 +31,7 @@ tmp
 /catalog/kuadrant-operator-catalog.Dockerfile
 /coverage/

+/kuadrant-ratelimit-wasm


maybe call it kuadrant-ratelimit-shim ?

it will be kuadrant-wasm-shim

didierofrivia · 2024-06-10T15:47:43Z

Dockerfile

@@ -16,12 +25,16 @@ COPY controllers/ controllers/
 COPY pkg/ pkg/

 # Build
-RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o manager main.go
+RUN WASM_SHIM_SHA256=$(cat /opt/kuadrant/wasm-shim/kuadrant-ratelimit-wasm.sha256) \


In the case we follow what Gui suggested, this var makes sense... if not, probably something like RATELIMIT_SHIM_SHA256 would make more sense

keeping as WASM_SHIM_SHA256

didierofrivia · 2024-06-10T15:50:05Z

make/wasm-shim.mk

+	}
+
+.PHONY: wasm-shim
+wasm-shim: $(WASM_SHIM) ## Download opm locally if necessary.


## Download the wasm-shim locally if necessary

eguzki · 2024-06-12T09:05:46Z

Applied s/kuadrant-ratelimit-wasm/kuadrant-wasm-shim/g (not literally) and updated verification steps.

ready for a new review

guicassolato · 2024-06-12T09:21:01Z

config/deploy/kustomization.yaml

 resources:
  - ../default
  - ../dependencies
+patchesStrategicMerge:
+- manager_debug_mode.yaml


Are we defaulting to debug on make deploy now?

yes! we can no longer run make run to run locally (an actual process in your local machine) which was configured with debug level to see all details.

So, now, we (developers) have only "make deploy" dev/Testing deployment option so I decided to deploy with debug to have all available logs.

This is not affecting OLM deployments for which LOG_LEVEL defaults to "INFO" level LOG_MODE to production

Hmmm. IDK. I think debug is OK maybe for local-deploy, but deploy is a legit target that blindly applies the deploy manifests to the kubectl context. It seems risky to enable debug by default. If this eventually ends up propagated to internal components config such as auth, it would leak sensitive data to the logs by default.

Maybe commenting this patch and let devs decide when to enable it?

yes! we can no longer run make run to run locally (an actual process in your local machine) which was configured with debug level to see all details.

Are we OK with this? I would pretty much always run controllers locally when developing, and a development environment where we are forced to re-deploy an image is not ideal.

Envoy is configured to download some file from the operator. If envoy runs in kubernetes and the operator in your local machine.... what can we do to make that download happen?

I was also heavily using make run. But the decision to serve the binary from the operator makes it very hard to keep that way.

I am happy to keep if you want.. but does not work for all the use cases.

Hmmm. IDK. I think debug is OK maybe for local-deploy, but deploy is a legit target that blindly applies the deploy manifests to the kubectl context. It seems risky to enable debug by default. If this eventually ends up propagated to internal components config such as auth, it would leak sensitive data to the logs by default.

I do not think make deploy will ever be used outside development. Anyway, it is faster to update than discuss. I made the changes and make deploy does not change. make local-deploy patches the deployment to setup debug/development mode

Makefile

make local-deploy patches the deployment to have debug log level and development log mode.

eguzki · 2024-06-13T15:58:12Z

Back to draft. Offline, enhancements were asked:

wasm binary download process add sha256 checksum check
The operator pushes the wasm binary into the envoy (using the provider API) instead of being pulled with a URL.

eguzki · 2024-07-01T13:00:18Z

There is currently no easy way to push the wasm binary into Envoy container using Istio API or EnvoyGateway API.

Istio approach

When using Istio Wasm OCI image API, providing a image url, Istio's proxy (aka istio-agent which is a xds proxy living in the same container as Envoy) will fetch the image and validate the sha256 checksum and then, cache it locally. The envoy configuration will look like:

config:
  name: istio-system.kuadrant-istio-ingressgateway
  vmConfig:
    runtime: envoy.wasm.runtime.v8
    code:
      local:
        filename: /var/lib/istio/data/81221938ebcbc4550eb35f72aac65d0939d89335832b5a022226fb2526806e9e/676b5f025bb67d0993fa37dc6b4de18ca515938a5d30e7451ba745c3098e485c.wasm
    configuration: {}

Envoy Gateway approach

Recently merged envoyproxy/gateway#3564 with the Wasm OCI feature, describes as:

EG to download Wasm images from remote registries and serve them to the Envoy fleet via 
a local HTTP server inside EG running on 18002.

Kuadrant's available options

Therefore, the available options for kuadrant to distribute the wasm binary:

❌ kuadrant dedicated wasm server component kuadrant wasm server component #593 : discarded by the kuadrant community feedback
❌ Wasm binary served from the kuadrant operator workload: breaks integration tests as gateways deployed in k8s cannot access kuadrant operator which run locally.
Wasm OCI URL: Supported by Istio and EnvoyGateway(feat: Wasm OCI image envoyproxy/gateway#3564)

eguzki added the kind/enhancement New feature or request label Jun 3, 2024

eguzki changed the title ~~wasm service~~ Wasm binary served from kuadrant operator workload Jun 3, 2024

eguzki mentioned this pull request Jun 3, 2024

Envoy Gateway Support #325

Closed

11 tasks

eguzki force-pushed the wasm-service branch from a07bfbb to cd4ad88 Compare June 4, 2024 08:19

eguzki commented Jun 4, 2024

View reviewed changes

eguzki marked this pull request as ready for review June 4, 2024 16:12

eguzki requested a review from a team as a code owner June 4, 2024 16:12

guicassolato reviewed Jun 10, 2024

View reviewed changes

didierofrivia reviewed Jun 10, 2024

View reviewed changes

eguzki mentioned this pull request Jun 12, 2024

update release name to kuadrant-wasm-shim-${{github.ref_name}} Kuadrant/wasm-shim#56

Merged

eguzki force-pushed the wasm-service branch from e796994 to 95dd3cd Compare June 12, 2024 09:04

eguzki requested review from guicassolato, didierofrivia and alexsnaps June 12, 2024 09:06

guicassolato reviewed Jun 12, 2024

View reviewed changes

Makefile Show resolved Hide resolved

eguzki added 8 commits June 12, 2024 15:27

wasm service

8332e8b

link rate limit wasm module at build time

e1e22f8

fix tests

4559470

wasm-service: small enhancements

d7e890d

s/kuadrant-ratelimit-wasm/kuadrant-wasm-shim/g

5ea6064

fix rebase issues

e518710

Makefile: local-cleanup target removes local wasm-shim binary

2a042c9

make deploy keeps the log level defaults.

1168e54

make local-deploy patches the deployment to have debug log level and development log mode.

eguzki force-pushed the wasm-service branch from 95dd3cd to 1168e54 Compare June 12, 2024 15:28

eguzki marked this pull request as draft June 13, 2024 15:56

eguzki closed this Jul 1, 2024

eguzki mentioned this pull request Dec 12, 2024

Support protected registry when loading WASM image #1077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wasm binary served from kuadrant operator workload #686

Wasm binary served from kuadrant operator workload #686

eguzki commented Jun 3, 2024 •

edited

Loading

eguzki Jun 4, 2024

codecov bot commented Jun 4, 2024 •

edited

Loading

guicassolato Jun 10, 2024

didierofrivia Jun 10, 2024

alexsnaps Jun 10, 2024

guicassolato Jun 10, 2024

eguzki Jun 10, 2024

guicassolato Jun 10, 2024

eguzki Jun 10, 2024

guicassolato Jun 10, 2024

didierofrivia Jun 10, 2024

eguzki Jun 12, 2024

didierofrivia Jun 10, 2024

eguzki Jun 12, 2024

didierofrivia Jun 10, 2024 •

edited

Loading

eguzki commented Jun 12, 2024

guicassolato Jun 12, 2024

eguzki Jun 12, 2024

guicassolato Jun 12, 2024 •

edited

Loading

mikenairn Jun 12, 2024

eguzki Jun 12, 2024

eguzki Jun 12, 2024

eguzki Jun 12, 2024 •

edited

Loading

eguzki commented Jun 13, 2024 •

edited

Loading

eguzki commented Jul 1, 2024

		@@ -305,15 +295,26 @@ test-unit: clean-cov generate fmt vet ## Run Unit tests.

		##@ Build

		WASM_SHIM = $(PROJECT_PATH)/kuadrant-ratelimit-wasm

	WASM_SHIM = $(PROJECT_PATH)/kuadrant-ratelimit-wasm
	WASM_SHIM = $(PROJECT_PATH)/kuadrant-wasm-shim

Wasm binary served from kuadrant operator workload #686

Wasm binary served from kuadrant operator workload #686

Conversation

eguzki commented Jun 3, 2024 • edited Loading

What

How

Istio

Verification Steps

Choose a reason for hiding this comment

codecov bot commented Jun 4, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

didierofrivia Jun 10, 2024 • edited Loading

Choose a reason for hiding this comment

eguzki commented Jun 12, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guicassolato Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eguzki Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

eguzki commented Jun 13, 2024 • edited Loading

eguzki commented Jul 1, 2024

Istio approach

Envoy Gateway approach

Kuadrant's available options

eguzki commented Jun 3, 2024 •

edited

Loading

codecov bot commented Jun 4, 2024 •

edited

Loading

didierofrivia Jun 10, 2024 •

edited

Loading

guicassolato Jun 12, 2024 •

edited

Loading

eguzki Jun 12, 2024 •

edited

Loading

eguzki commented Jun 13, 2024 •

edited

Loading