This is the hello world of pipelines.
It uses a cron schedule as a source and then just cat the message to a log
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/101-hello-pipeline.yaml
This example shows an example of having two nodes in a pipeline.
While they read from Kafka, they are connected by a NATS Streaming subject.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/101-two-node-pipeline.yaml
This is an example of built-in de-duplication step.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/102-dedupe-pipeline.yaml
This is an example of built-in filtering.
Filters are written using expression syntax and must return a boolean.
They have a single variable, msg
, which is a byte array.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/102-filter-pipeline.yaml
This is an example of built-in flattening and expanding.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/102-flatten-expand-pipeline.yaml
This is an example of built-in mapping.
Maps are written using expression syntax and must return a byte array.
They have a single variable, msg
, which is a byte array.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/102-map-pipeline.yaml
This is an example of having multiple replicas for a single step.
Replicas are automatically scaled up and down depending on the the desired formula, which can be computed using the following:
pending
total number of pending messages.pendingDelta
change in number of pending messages.currentReplicas
the current number of replicas.limit(v, min, max, delta)
a function to constraint the minimum and maximum number of replicas, as well as the step-up/down.
In this example:
- Each period is 60s.
- Each replica can consume 250 messages each second.
- We want to consume all pending messages in 10 periods.
- We want to have between 0 and 4 replicas, and scale-up or down maximum 2 replicas at a time.
You can scale to zero. The number of replicas will be periodically scaled to 1 so it can "peek" the the message queue. The number of pending messages is measured and the target number of replicas re-calculated.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/103-autoscaling-pipeline.yaml
This is an example of having multiple replicas for a single step.
Steps can be manually scaled using kubectl
:
kubectl scale step/scaling-main --replicas 3
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/103-scaling-pipeline.yaml
This example of Go 1.17 handler.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/104-golang1-17-pipeline.yaml
This example is of the Java 16 handler.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/104-java16-pipeline.yaml
This example is of the NodeJS 16 handler.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/104-node16-pipeline.yaml
This example is of the Python 3.9 handler.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/104-python3-9-pipeline.yaml
This example of a pipeline using Git.
The Git handler allows you to check your application source code into Git. Dataflow will checkout and build your code when the step starts. This example presents how one can use go runtime git step.
Link to directory that is be cloned with git
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/106-git-go-pipeline.yaml
This example of a pipeline using Git with NodeJS.
The Git handler allows you to check your application source code into Git. Dataflow will checkout and build your code when the step starts. This example presents how one can use nodejs runtime git step.
Link to directory that is be cloned with git
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/106-git-nodejs-pipeline.yaml
This example of a pipeline using Git, showing how to use Generator Step.
The Git handler allows you to check your application source code into Git. Dataflow will checkout and build your code when the step starts. This example presents how one can use python runtime git generator step. Generator steps are long running processes generating values over time. Such a step doesn't have a source, only sink(s).
Link to directory that is be cloned with git
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/106-git-python-generator-pipeline.yaml
This example of a pipeline using Git.
The Git handler allows you to check your application source code into Git. Dataflow will checkout and build your code when the step starts. This example presents how one can use python runtime git step.
Link to directory that is be cloned with git
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/106-git-python-pipeline.yaml
This example shows a pipeline running to completion.
A pipeline that run to completion (aka "terminating") is one that will finish.
For a pipeline to terminate one of two things must happen:
- Every steps exits successfully (i.e. with exit code 0).
- One step exits successfully, and is marked with
terminator: true
. When this happens, all other steps are killed.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/107-completion-pipeline.yaml
This example demonstrates having a terminator step, and then terminating other steps using different terminations strategies.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/107-terminator-pipeline.yaml
This example showcases container options.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/108-container-pipeline.yaml
This example use named pipe to send and receive messages.
Two named pipes are made available:
- The container can read lines from
/var/run/argo-dataflow/in
. Each line will be a single message. - The contain can write to
/var/run/argo-dataflow/out
. Each line MUST be a single message.
You MUST escape new lines.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/108-fifos-pipeline.yaml
This is an example of built-in grouping.
WARNING: The spec/syntax not been finalized yet. Please tell us how you think it should work!
There are four mandatory fields:
key
A string expression that returns the message's keyendOfGroup
A boolean expression that returns whether or not to end group and send the group messages onwards.format
What format the grouped messages should be in.storage
Where to store messages ready to be forwarded.
Storage can either be:
- An ephemeral volume - you don't mind loosing some or all messages (e.g. development or pre-production).
- A persistent volume - you want to be to recover (e.g. production).
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/109-group-pipeline.yaml
This pipeline processes pets (cats and dogs).
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/201-vetinary-pipeline.yaml
This pipeline count the number of words in a document, not the number of count of each word as you might expect.
It also shows an example of a pipelines terminates based on a single step's status.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/201-word-count-pipeline.yaml
This example uses a cron source and a log sink.
You can format dates using a "layout":
https://golang.org/pkg/time/#Time.Format
By default, the layout is RFC3339.
- Cron sources are unreliable. Messages will not be sent when a pod is not running, which can happen at any time in Kubernetes.
- Cron sources must not be scaled to zero. They will stop working.
This logs the message.
- Log sinks are totally reliable.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-cron-log-pipeline.yaml
This example showcases retry policies.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-erroring-pipeline.yaml
This example uses a HTTP sources and sinks.
HTTP has the advantage that it is stateless and therefore cheap. You not need to set-up any storage for your messages between steps.
- A HTTP sink may return and error code, clients will need to re-send messages.
- Adding replicas will nearly linearly increase throughput.
- Due to the lack of state, do not use HTTP sources and sinks to connect steps.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-http-pipeline.yaml
This example shows reading and writing to a JetStream subject.
- Adding replicas will nearly linearly increase throughput.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-jetstream-pipeline.yaml
This example shows reading and writing to a Kafka topic
- Kafka topics are typically partitioned. Dataflow will process each partition simultaneously.
- Adding replicas will nearly linearly increase throughput.
- If you scale beyond the number of partitions, those additional replicas will be idle.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-kafka-pipeline.yaml
This example shows reading and writing to a STAN subject.
- Adding replicas will nearly linearly increase throughput.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-stan-pipeline.yaml
This example has two sinks.
- When using two sinks, you should put the most reliable first in the list, if the message cannot be delivered, then subsequent sinks will not get the message.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-two-sinks-pipeline.yaml
This example has two sources
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/301-two-sources-pipeline.yaml
This is an example of providing a bearer token for a HTTP source.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/dataflow-103-http-main-source-default-secret.yaml
This is an example of providing a namespace named NATS Streaming configuration.
The secret must be named dataflow-jetstream-${name}
.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/dataflow-jetstream-default-secret.yaml
This is an example of providing a namespace named Kafka configuration.
The secret must be named dataflow-kafka-${name}
.
brokers: broker.a,broker.b
version: "2.0.0"
net.tls.caCert: "" net.tls.cert: "" net.tls.key: ""
net.sasl.mechanism: PLAIN net.sasl.user: "" net.sasl.password: ""
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/dataflow-kafka-default-secret.yaml
This is an example of providing a namespace named S3 configuration.
The secret must be named dataflow-s3-${name}
.
Learn about configuration
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/dataflow-s3-default-secret.yaml
This is an example of providing a namespace named NATS Streaming configuration.
The secret must be named dataflow-stan-${name}
.
kubectl apply -f https://raw.githubusercontent.com/argoproj-labs/argo-dataflow/main/examples/dataflow-stan-default-secret.yaml