This project provides a simple but realistic example of a Kafka producer and consumer compatible with MapR Streams.
This project is a copy of the Sample Programs for Kafka 0.9.0.
The application logic is exactly the same, as you can see the code does not contain any difference with the original Kafka program. The differences are:
- the topic names are using the MapR Streams naming convention
/stream:topic
- the application is executed in the context of a MapR Client and its dependencies.
To start, you need to get a MapR 5.2 running. You can install your own cluster or download a sandbox.
A stream is a collection of topics that you can manage together for security, default number or partitions, and time to leave for the messages.
Run the following command on your MapR cluster:
$ maprcli stream create -path /sample-stream
By default the produce and consume topic permission are defaulted to the creator of the streams, the unix user you are using to run the maprcli
command.
It is possible to configure the permission by editing the streams, for example to make all the topic available to anybody (public permission) you can run the following command:
maprcli stream edit -path /sample-stream -produceperm p -consumeperm p -topicperm p
This is useful for this example since we want to run the producer and consumer from remote computers too.
We need two topics for the example program, that are also created with the maprcli
tool
$ maprcli stream topic create -path /sample-stream -topic fast-messages
$ maprcli stream topic create -path /sample-stream -topic summary-markers
These can be listed
$ maprcli stream topic list -path /sample-stream
topic partitions logicalsize consumers maxlag physicalsize
fast-messages 1 0 0 0 0
summary-markers 1 0 0 0 0
Note that the program will automatically create the topic if it does not already exist.
Go back to the directory where you have the example programs and compile and build the example programs.
$ cd ..
$ mvn package
...
The project create a jar with all external dependencies ( ./target/mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar
)
You can install the MapR Client and run the application locally, or copy the jar file on your cluster (any node).
For example copy the program to your server using scp:
scp ./target/mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar mapr@<YOUR_MAPR_CLUSTER>:/home/mapr
The producer will send a large number of messages to /sample-stream:fast-messages
along with occasional messages to /sample-stream:summary-markers
. Since there isn't
any consumer running yet, nobody will receive the messages.
Any MapR Streams application need the MapR Client library to be executed, for this you just have to add them to the application classpath.
The classpath is available using the command /opt/mapr/bin/mapr classpath
. So you can run the application using the following command
$ java -cp `mapr classpath`:./mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.mapr.examples.Run producer
Sent msg number 0
Sent msg number 1000
...
Sent msg number 998000
Sent msg number 999000
In another window you can run the consumer using the following command:
$ java -cp `mapr classpath`:./mapr-streams-examples-1.0-SNAPSHOT-jar-with-dependencies.jar com.mapr.examples.Run consumer
1 messages received in period, latency(min, max, avg, 99%) = 20352, 20479, 20416.0, 20479 (ms)
1 messages received overall, latency(min, max, avg, 99%) = 20352, 20479, 20416.0, 20479 (ms)
1000 messages received in period, latency(min, max, avg, 99%) = 19840, 20095, 19968.3, 20095 (ms)
1001 messages received overall, latency(min, max, avg, 99%) = 19840, 20479, 19968.7, 20095 (ms)
...
1000 messages received in period, latency(min, max, avg, 99%) = 12032, 12159, 12119.4, 12159 (ms)
998001 messages received overall, latency(min, max, avg, 99%) = 12032, 20479, 15073.9, 19583 (ms)
1000 messages received in period, latency(min, max, avg, 99%) = 12032, 12095, 12064.0, 12095 (ms)
999001 messages received overall, latency(min, max, avg, 99%) = 12032, 20479, 15070.9, 19583 (ms)
Note that there is a latency listed in the summaries for the message batches. This is because the consumer wasn't running when the message were sent to Kafka and thus it is only getting them much later, long after they were sent.
The consumer should, however, gnaw its way through the backlog pretty quickly, however and the per batch latency should be shorter by the end of the run than at the beginning. If the producer is still running by the time the consumer catches up, the latencies will probably drop into the single digit millisecond range.
At any time you can use the maprcli
tool to get some information about the topic, for example:
$ maprcli stream topic info -path /sample-stream -topic fast-messages -json
-json
is just use to get the topic information as a JSON document.
When you are done playing, you can delete the stream, and all associated topic using the following command:
$ maprcli stream delete -path /sample-stream
- The topics have move from
"fast-messages"
to"/sample-stream:fast-messages"
and"summary-markers"
to"/sample-stream:summary-markers"
- The producer and consumer configuration parameters that are not used by MapR Streams are automatically ignored
- The producer and Consumer applications are executed with the dependencies of a MapR Client not Apache Kafka.
That's it!
Note that this example was derived in part from the documentation provided by the Apache Kafka project. We have added short, realistic sample programs that illustrate how real programs are written using Kafka.