Kafka POCs

Kafka POC projects

How to configure Kafka consumers to seek offsets by timestamp
- Normally, we consume Kafka messages from the beginning/end of a topic, or the last committed offsets. For backfilling or troubleshooting however, we occasionally need to consume messages from a certain timestamp. The Kafka consumer class of the kafka-python package has a method to seek a particular offset for a topic partition. Therefore, if we know which topic partition to choose – such as by assigning a topic partition – we can easily override the fetch offset. When we deploy multiple consumer instances together however, we make them subscribe to a topic, and topic partitions are dynamically assigned, which means we do not know which topic partition will be assigned to a consumer instance in advance. In this post, we will discuss how to configure the Kafka consumer to seek offsets by timestamp where topic partitions are dynamically assigned by subscription.
Simplify Streaming Ingestion on AWS – Part 1 MSK and Redshift
- Apache Kafka is a popular distributed event store and stream processing platform. Previously loading data from Kafka into Redshift and Athena usually required Kafka connectors (e.g. Amazon Redshift Sink Connector and Amazon S3 Sink Connector). Recently these AWS services provide features to ingest data from Kafka directly, which facilitates a simpler architecture that achieves low-latency and high-speed ingestion of streaming data. In part 1 of the simplify streaming ingestion on AWS series, we discuss how to develop an end-to-end streaming ingestion solution using EventBridge, Lambda, MSK and Redshift Serverless on AWS.
Simplify Streaming Ingestion on AWS – Part 2 MSK and Athena
- In Part 1, we discussed a streaming ingestion solution using EventBridge, Lambda, MSK and Redshift Serverless. Athena provides the MSK connector to enable SQL queries on Apache Kafka topics directly and it can also facilitate the extraction of insights without setting up an additional pipeline to store data into S3. In this post, we discuss how to update the streaming ingestion solution so that data in the Kafka topic can be queried by Athena instead of Redshift.
Integrate Glue Schema Registry With Your Python Kafka App
- Glue Schema Registry provides a centralized repository for managing and validating schemas for topic message data. Its features can be utilized by many AWS services when building data streaming applications. In this post, we will discuss how to integrate Python Kafka producer and consumer apps in AWS Lambda with the Glue Schema Registry.
Kafka Development with Docker
- Apache Kafka is one of the key technologies for modern data streaming architectures on AWS. Developing and testing Kafka-related applications can be easier using Docker and Docker Compose. In this series of posts, I will demonstrate reference implementations of those applications in Dockerized environments.

Name		Name	Last commit message	Last commit date
Latest commit History 221 Commits
.vscode		.vscode
ccdak		ccdak
glue-schema-registry		glue-schema-registry
integration-athena		integration-athena
integration-redshift		integration-redshift
kafka-connect-for-aws		kafka-connect-for-aws
kafka-dev-on-k8s		kafka-dev-on-k8s
kafka-dev-with-docker		kafka-dev-with-docker
misc		misc
msk-lab		msk-lab
offset-seeking		offset-seeking
security-course		security-course
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kafka POCs

Kafka POC projects

About

Contributors 2

Languages

jaehyeon-kim/kafka-pocs

Folders and files

Latest commit

History

Repository files navigation

Kafka POCs

Kafka POC projects

About

Topics

Resources

Stars

Watchers

Forks

Contributors 2

Languages