Skip to content

Roadmap for Xinfra Monitor

Andrew Choi edited this page Jun 25, 2020 · 6 revisions

Here are a few things in the roadmap that we plan to work on to make Xinfra Monitor more useful.

Monitoring the Full Propagation of Topic Creations and Deletions within the Cluster

What's the availability for and how long does it take for the new topic data and metadata information to propagate to every broker in the cluster?

Priority: FY21-Q1

Monitoring Effects of Metadata Changes within the Cluster - Partition expansion

For instance, how long does it take for leadership information of the new partitions to propagate to every broker in the cluster?

Priority: FY21-Q1

Monitoring Effects of Metadata Changes within the Cluster - ACLs propagation

For example, how long does it take for the ACL metadata to be communicated/propagated to every broker in the cluster?

Priority: FY21-Q1

Integration with Graphite and similar frameworks

It is useful for users to be able to view all Kafka-related metrics from one web service in their organization. Graphite is one of the most popular open source solutions that allow users to store metrics and view metrics as time-series graphs. We plan to improve the existing DefaultMetricsReporterService so that users can export Kafka Monitor metrics to Graphite and other metrics storage services that they choose.

This involves 3rd party libraries and services LinkedIn does not use or is isn't involved too much with. If users in the open source community wants to maintain this feature with sound documentation and tests, that is okay.

Various improvements to test scheduling

Users should have the ability to schedule custom actions (e.g. broker bounce, broker hard kill) to be executed at regular interval. This can be used together with other services to make assertions (e.g. no message loss, no message reorder) about Kafka's performance under a variety of scenarios. This can be deployed your private kafka cluster to test Kafka's performance and fault tolerance.

This is a possibility for other services to implement (or Xinfra Monitor). Perhaps it's more applicable to Cruise Control. It's hard to know when it's safe to do these things unless there is all the data that Cruise Control has.

Automatic cluster deployment

Another future work is to provide capability to deploy Kafka cluster using Apache Kafka with the user-specified git hash value. This allows us to automatically test a range of Kafka commits to capture bugs that may be missed by Apache Kafka's unit tests or system tests.

This is could be implemented in external services other than Xinfra Monitor, including Cruise Control. It's difficult to know when it's safe to do these things unless there is all the data that Cruise Control has.