Skip to content
Mahmoud Ben Hassine edited this page Feb 7, 2020 · 15 revisions

What is Easy Batch?

Easy Batch is a framework that aims to simplify batch processing with Java. It was specifically designed for simple ETL jobs. Writing batch applications requires a lot of boilerplate code: reading, writing, filtering, parsing and validating data, logging, reporting to name a few.. The idea is to free you from these tedious tasks and let you focus on your batch application's logic.

Task You Easy Batch
Implement business logic x
Handle resources I/O x
Data filtering / validation x
Type conversion x
Objects marshalling / unmarshalling x
Transaction management x
Logging / Reporting x
Job Monitoring x

How does it work?

Easy Batch jobs are simple processing pipelines. Records are read in sequence from a data source, processed in pipeline and written in batches to a data sink:

Easy Batch provides the Record and Batch APIs to abstract data format and process records in a consistent way regardless of the data source/sink types.

Show me the code!

Let's suppose you have some tweets represented by a Tweet class and you want to transform them from CSV to XML. Here is how to do it with Easy Batch:

Path inputFile = Paths.get("tweets.csv");
Path outputFile = Paths.get("tweets.xml");
Job job = new JobBuilder()
         .reader(new FlatFileRecordReader(inputFile))
         .filter(new HeaderRecordFilter())
         .mapper(new DelimitedRecordMapper(Tweet.class, "id", "user", "message"))
         .marshaller(new XmlRecordMarshaller(Tweet.class))
         .writer(new FileRecordWriter(outputFile))
         .batchSize(10)
         .build();

JobExecutor jobExecutor = new JobExecutor();
JobReport report = jobExecutor.execute(job);
jobExecutor.shutdown();

Easy Batch makes your code declarative, intuitive, easy to read, understand, test and maintain.

Clone this wiki locally