Pipelaner is a high-performance and efficient Framework and Agent for creating data pipelines. The core of pipeline descriptions is based on the Configuration As Code concept and the Pkl configuration language by Apple.
Pipelaner manages data streams through three key entities: Generator, Transform and Sink.
The component responsible for creating or retrieving source data for the pipeline. Generators can produce messages, events, or retrieve data from various sources such as files, databases, or APIs.
- Example use case:
Reading data from a file or receiving events via webhooks.
The component that processes data within the pipeline. Transforms perform operations such as filtering, aggregation, data transformation, or cleaning to prepare it for further processing.
- Example use case:
Filtering records based on specific conditions or converting data format from JSON to CSV.
The final destination for the data stream. Sinks send processed data to a target system, such as a database, API, or message queue.
- Example use case:
Saving data to PostgreSQL or sending it to a Kafka topic.
Parameter | Type | Description |
---|---|---|
name |
String | Unique name of the pipeline element. |
threads |
Int | Number of threads for processing messages. Defaults to the value of GOMAXPROC . |
outputBufferSize |
Int | Size of the output buffer. Not applicable to Sink components. |
Name | Description |
---|---|
cmd | Reads the output of a command, e.g., "/usr/bin/log" "stream --style ndjson" . |
kafka | Apache Kafka consumer that streams Value into the pipeline. |
pipelaner | GRPC server that streams values via gRPC. |
Name | Description |
---|---|
batch | Forms batches of data with a specified size. |
chunks | Splits incoming data into chunks. |
debounce | Eliminates "bounce" (frequent repeats) in data. |
filter | Filters data based on specified conditions. |
remap | Reassigns fields or transforms the data structure. |
throttling | Limits data processing rate. |
Name | Description |
---|---|
clickhouse | Sends data to a ClickHouse database. |
console | Outputs data to the console. |
http | Sends data to a specified HTTP endpoint. |
kafka | Publishes data to Apache Kafka. |
pipelaner | Streams data via gRPC to other Pipelaner nodes. |
For operation on a single host:
For distributed data processing across multiple hosts:
For distributed interaction between nodes, you can use:
- gRPC — via generators and sinks with the parameter
sourceName: "pipelaner"
. - Apache Kafka — for reading/writing data via topics.
Example configuration using Kafka:
new Inputs.Kafka {
...
common {
...
topics {
"kafka-topic"
}
}
}
new Sinks.Kafka {
...
common {
...
topics {
"kafka-topic"
}
}
}
Examples | Description |
---|---|
Basic Pipeline | A simple example illustrating the creation of a basic pipeline with prebuilt components. |
Custom Components | An advanced example showing how to create and integrate custom Generators, Transforms, and Sinks. |
-
🌟 Basic Pipeline
Learn the fundamentals of creating a pipeline with minimal configuration using ready-to-use components. -
🛠 Custom Components
Extend Pipelaner’s functionality by developing your own Generators, Transforms, and Sinks.
Each example includes clear configuration files and explanations to help you get started quickly.
💡 Tip: Use these examples as templates to customize and build your own pipelines efficiently.
If you have questions, suggestions, or encounter any issues, please create an Issue in the repository.
You can also participate in discussions in the Discussions section.
This project is licensed under the Apache 2.0 license.
You are free to use, modify, and distribute the code under the terms of the license.