Issue 89: Add Batch Reader Support #92

maddisondavid · 2020-01-12T13:19:51Z

Change log description
Adds support for using the BatchReader instead of the StreamingReader within consumers

Purpose of the change
Fixes #89

What the code does
When reading historic data from a stream data can be read using either the Streaming Reader or the Batch (bounded) Reader. This adds a new -batchreaders option that signifies that the stream should be consumed using the BatchClient. In this mode all the segments from with a Stream bound are supplied by the Pravega Client and the consumers are free to consume them in any order.

The existing consumer has been renamed PravegaStreamingReaderWorker and a new PravegaBatchReaderWorker has been introduced. Each reader is assigned a certain number of the Batch segments with an attempt to give each consumer an equal number of the segments. If the segment count is lower than the number of specified consumers then the consumer count is reduced so that there is one consumer per Batch segment.

 -batchreaders                       signifies the consumers should all be
                                     Batch Readers rather than Streaming
                                     readers

How to verify
When running with the -batchreaders option the test will log the consumer to segment assignments

>pravega-benchmark -controller tcp://localhost:9090 -consumers 10 -scope dave -stream test6 -segments 5 -time 20 -batchreaders
...
...
...
2020-01-12 13:16:49:359 +0000 [main] INFO io.pravega.perf.PravegaPerfTest - Limiting To 5 consumers due to small number of segment ranges
2020-01-12 13:16:49:359 +0000 [main] INFO io.pravega.perf.PravegaPerfTest - Segment Assignment 0 -> test6 0(0:2215032)
2020-01-12 13:16:49:359 +0000 [main] INFO io.pravega.perf.PravegaPerfTest - Segment Assignment 1 -> test6 1(0:2175864)
2020-01-12 13:16:49:359 +0000 [main] INFO io.pravega.perf.PravegaPerfTest - Segment Assignment 2 -> test6 2(0:2194224)
2020-01-12 13:16:49:359 +0000 [main] INFO io.pravega.perf.PravegaPerfTest - Segment Assignment 3 -> test6 3(0:2177904)
2020-01-12 13:16:49:359 +0000 [main] INFO io.pravega.perf.PravegaPerfTest - Segment Assignment 4 -> test6 4(0:2189328)

As the consumers progress through thier assigned segments ranges they will also log progress:

2020-01-12 13:16:49:379 +0000 [ForkJoinPool-2-worker-4] INFO io.pravega.perf.PravegaBatchReaderWorker - id:3 Starting Segment test6, 3(0:2177904)
2020-01-12 13:16:49:379 +0000 [ForkJoinPool-2-worker-3] INFO io.pravega.perf.PravegaBatchReaderWorker - id:2 Starting Segment test6, 2(0:2194224)
2020-01-12 13:16:49:379 +0000 [ForkJoinPool-2-worker-1] INFO io.pravega.perf.PravegaBatchReaderWorker - id:0 Starting Segment test6, 0(0:2215032)
2020-01-12 13:16:49:379 +0000 [ForkJoinPool-2-worker-5] INFO io.pravega.perf.PravegaBatchReaderWorker - id:4 Starting Segment test6, 4(0:2189328)
2020-01-12 13:16:49:379 +0000 [ForkJoinPool-2-worker-2] INFO io.pravega.perf.PravegaBatchReaderWorker - id:1 Starting Segment test6, 1(0:2175864)
...
...
2020-01-12 13:16:49:595 +0000 [ForkJoinPool-2-worker-2] INFO io.pravega.perf.PravegaBatchReaderWorker - id:1 Completed Segment test6, 1(0:2175864)
2020-01-12 13:16:49:595 +0000 [ForkJoinPool-2-worker-2] INFO io.pravega.perf.PravegaBatchReaderWorker - id:1 Completed all assigned assignedSegments
2020-01-12 13:16:49:601 +0000 [ForkJoinPool-2-worker-4] INFO io.pravega.perf.PravegaBatchReaderWorker - id:3 Completed Segment test6, 3(0:2177904)
2020-01-12 13:16:49:601 +0000 [ForkJoinPool-2-worker-3] INFO io.pravega.perf.PravegaBatchReaderWorker - id:2 Completed Segment test6, 2(0:2194224)
2020-01-12 13:16:49:601 +0000 [ForkJoinPool-2-worker-4] INFO io.pravega.perf.PravegaBatchReaderWorker - id:3 Completed all assigned assignedSegments
2020-01-12 13:16:49:601 +0000 [ForkJoinPool-2-worker-3] INFO io.pravega.perf.PravegaBatchReaderWorker - id:2 Completed all assigned assignedSegments
...
...

Signed-off-by: David Maddison david.maddison@dell.com

Signed-off-by: David Maddison <david.maddison@dell.com>

claudiofahey

Looks good. I have a few comments.

src/main/java/io/pravega/perf/PravegaBatchReaderWorker.java

claudiofahey · 2020-01-15T18:26:58Z

src/main/java/io/pravega/perf/PravegaPerfTest.java

+            return batchReaders ? getBatchConsumers() : getStreamingConsumers();
+        }
+
+        public List<ReaderWorker> getBatchConsumers() {


A key difference between the batch and streaming readers in this benchmark is that there can be multiple benchmark processes reading together through a reader group. Adding more processes will increase the parallelism and data will not be read multiple times.
The batch reader does not perform this coordination so if one attempts to run multiple reader benchmark processes, each process will read the entire stream.
This behavior is reasonable but it should be documented.

That's a good point and it was a subtly in the code that I missed, i.e. that the ReaderGroup name is devised from the steam name itself and therefore there is implicit coordination between multiple distributed processes.

https://github.com/pravega/pravega-benchmark/blob/master/src/main/java/io/pravega/perf/PravegaPerfTest.java#L290-L293

if (recreate) { rdGrpName = streamName + startTime; } else { rdGrpName = streamName + "RdGrp"; }

Obviously, as you point out, this is a bit harder to achieve with the BatchClient as the coordination is basically left up to the processor itself. I'll document this current restriction and find a way of distributing the processing in another PR.

If you need to distribute the processing (in another PR), consider accepting parameters for the process index and the number of processes. Then each process can sort the available segment ranges and take a deterministic subset.

claudiofahey · 2020-01-15T18:32:09Z

src/main/java/io/pravega/perf/PravegaPerfTest.java

+         *
+         * @return A list of lists, each list representing the segments assigned to that consumer
+         */
+        private List<List<SegmentRange>> assignSegmentsToConsumers(List<SegmentRange> segmentRanges, int consumers) {


Consider using com.google.common.collect.Lists.partition instead of this.

Originally I did use Google Lists.partition but the problem with that API is that partitions the list based on a specific partition size and NOT (as I'd assumed) by the number of required partitions.

This lead to getting into calculating the required size of partitions along with handling the cases where the number of SegmentRanges wasn't completely divisible by the number of consumers. In the end it felt a lot simpler to partition the list as done here.

src/main/java/io/pravega/perf/PravegaPerfTest.java

…ed work Signed-off-by: David Maddison <david.maddison@dell.com>

Signed-off-by: David Maddison <david.maddison@dell.com>

RaulGracia

Looks good, I left some comments.

src/main/java/io/pravega/perf/WorkerCompleteException.java

src/main/java/io/pravega/perf/PravegaStreamingReaderWorker.java

RaulGracia · 2020-01-19T19:26:44Z

src/main/java/io/pravega/perf/ReaderWorker.java

@@ -100,6 +100,7 @@ public void EventsReaderRW() throws IOException {
                    i++;
                }
            }
+        } catch(WorkerCompleteException ignore) {


Don't we need to log anything here?

No, because the Exception only exists to indicate to the logic that the reader has completed, it's not an actual exception or an error condition.

The PravegaBatchReaderWorker (which is the only worker that throws this exception) will log that it competed everything, so another log would just be redundant.

src/main/java/io/pravega/perf/PravegaBatchReaderWorker.java

src/main/java/io/pravega/perf/PravegaPerfTest.java

RaulGracia · 2020-01-20T09:14:59Z

src/main/java/io/pravega/perf/PravegaPerfTest.java

+                    )
+                    .collect(Collectors.toList());
+            } else {
+                readers = null;


Is this case possible? If the user does not specify consumers, we could take as default the number of segments or just 1. But if the user sets consumerCount <= 0, the tool should throw an error even before reaching this point telling the user that the input is incorrect, right?

This logic follows the same pattern as the streaming consumers. The problem is that the test doesn't know if consumers are required, it only knows that because consumers > 0. If consumers was defaulted to 1 then a test would always have a consumer, which is not desirable.

src/main/java/io/pravega/perf/PravegaPerfTest.java

src/main/java/io/pravega/perf/PravegaBatchReaderWorker.java

src/main/java/io/pravega/perf/WorkerCompleteException.java

Signed-off-by: David Maddison <david.maddison@dell.com>

RaulGracia · 2020-03-18T15:31:51Z

@maddisondavid can you please have a look to the conflicts so we can merge this one?

RaulGracia · 2020-06-08T14:13:28Z

@maddisondavid any updates on this one?

Issue 89 Add Batch Reader Support

ea22133

Signed-off-by: David Maddison <david.maddison@dell.com>

maddisondavid changed the title ~~Issue 89 Add Batch Reader Support~~ Issue 89: Add Batch Reader Support Jan 12, 2020

maddisondavid mentioned this pull request Jan 15, 2020

Add support for reading with BatchClient #89

Open

claudiofahey approved these changes Jan 15, 2020

View reviewed changes

maddisondavid added 2 commits January 16, 2020 15:15

Issue 89: Throw an exception when the worker has completed all assign…

13dfca8

…ed work Signed-off-by: David Maddison <david.maddison@dell.com>

Issue 89: Simplify setting batchreaders

2703657

Signed-off-by: David Maddison <david.maddison@dell.com>

RaulGracia requested changes Jan 20, 2020

View reviewed changes

Fixed code formatting issues

a3461f3

Signed-off-by: David Maddison <david.maddison@dell.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue 89: Add Batch Reader Support #92

Issue 89: Add Batch Reader Support #92

maddisondavid commented Jan 12, 2020

claudiofahey left a comment

claudiofahey Jan 15, 2020

maddisondavid Jan 16, 2020

claudiofahey Jan 16, 2020

claudiofahey Jan 15, 2020

maddisondavid Jan 16, 2020

RaulGracia left a comment

RaulGracia Jan 19, 2020

maddisondavid Feb 17, 2020

RaulGracia Jan 20, 2020

maddisondavid Feb 17, 2020

RaulGracia commented Mar 18, 2020

RaulGracia commented Jun 8, 2020

Issue 89: Add Batch Reader Support #92

Are you sure you want to change the base?

Issue 89: Add Batch Reader Support #92

Conversation

maddisondavid commented Jan 12, 2020

claudiofahey left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RaulGracia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RaulGracia commented Mar 18, 2020

RaulGracia commented Jun 8, 2020