Skip to content
Mahmoud Ben Hassine edited this page Feb 15, 2020 · 14 revisions

1. How does Easy Batch read data from a data source?

Easy Batch streams data record by record from the data source. Depending on the data source type, a record can be a line in a flat file, an tag in an Xml file, a record in a database table, etc. The RecordReader abstraction is intended to be implemented with a Streaming API so that the data source is not entirely loaded in memory (which is the main cause of java.lang.OutOfMemoryError of many batch applications). There are several implementations of the RecordReader interface to read data from a variety of data sources. Please refer to the user guide for all details about available record readers.

2. How does Easy Batch write data to a data sink?

Easy Batch writes records in batches. Most APIs provide a way to write data in bulk mode for performance reasons. The RecordWriter abstraction is designed to work this way. The writeRecords method takes a batch of records and writes them as a unit to the data sink. Usually, the write operation is performed within a transaction boundary so records are written in unit or rolled back for re-processing. Easy Batch provides several implementations of the RecordWriter interface to write data to a variety of data sinks. Please refer to the user guide for all details about available record writers.

3. Can I use Easy Batch to process non-textual data?

Yes. Even though common use cases are about processing textual data, Easy Batch Record abstraction can be implemented for any type of input data. For example, in a scenario where you need to compress a set of images with Java in a batch mode, a record can be one image file. Easy Batch API is generic and can be used to process any type of input data.

4. Can I use the Bean Validation API (JSR 303/349) with Easy Batch?

Yes. Easy Batch uses the reference implementation Hibernate validator to validate domain objects. For all details about how to validate data using Bean Validation API with Easy Batch, please refer to the user guide.

5. Does Easy Batch implement the "Batch Applications for the Java Platform" API (JSR 352)?

No. Easy Batch has been designed and implemented before the JSR 352 has been submitted.

6. Can I monitor Easy Batch execution with JMX?

Yes. You can enable Jmx monitoring with JobBuilder#enableJmx(). This will register a JMX MBean named org.jeasy.batch.jmx.monitor:name=YourJobName at job startup. You can use any standard JMX compliant tool to monitor your job metrics. You can also use a JobMonitorProxy and register a JobMonitoringListener to listen to push notifications sent by the job at runtime.

7. Can I use Easy Batch with Spring?

Yes. the JobFactoryBean can be used to configure and declare Easy Batch jobs as Spring beans. This factory bean can be used by importing the easy-batch-spring module.

8. How does Easy Batch compare to Spring Batch?

Comparing Easy Batch and Spring Batch would be unfair, because even if both frameworks fundamentally try to solve the same problem, they are conceptually different at several levels:

  • Job structure: A job in Spring Batch is a collection of steps. A step can be a single task or chunk-oriented. In Easy Batch, there is no concept of step. A job in Easy Batch is similar to a Spring Batch job with a single chunk-oriented step using an in-memory job repository.
  • Job definition: Spring Batch provides a DSL to define the execution flow of steps within a job. In Easy Batch, there is no such DSL. Creating a workflow of jobs is left to an external workflow engine like Easy Flows.
  • Job execution: A Spring Batch job can have multiple job instances (identified by (identifying) job parameters). Each job instance may in turn have multiple executions. In Easy Batch, there is no such job instance or job execution concepts. Jobs are Callable objects that can be executed with a JobExecutor or ExecutorService.

That said, Spring Batch is an advanced batch processing framework with a very rich features set such as flows, remoting, partitioning, automatic retry on failure, etc. Easy Batch is a bit like Spring Batch, but much smaller and not as bright! It's a simple and lightweight framework that can be learned quickly and used easily for the majority of batch processing use cases. Easy Batch does not compete with Spring Batch but tries to provide an alternative that is easier to learn, configure and use. A detailed comparison between Easy Batch and Spring Batch can be found in the following blog posts:

9. Why does Easy Batch not persist job state in a database like Spring Batch?

Easy Batch started from the belief that batch jobs should be designed to be restartable without relying on any tool. One of the best characteristics a batch job can have is idempotency. If a job cannot be implemented in an idempotent way, there are always patterns to make it restartable without persisting its state, like the process indicator pattern, or the staging table pattern, etc.

Persisting the job state during the execution is not only expensive in terms of performance, but also requires additional setup, configuration and maintenance.

If you think about it, the failure rate of batch jobs is always neglectable compared to the success rate (even if the failure rate is high!), so adding such a feature will put an unfair disadvantage for the majority of job executions to the profit of a minority of failures.

That said, its possible to persist the job state in a persistent store if necessary. You can find an example in the Restart a failed job tutorial.

10. Why does Easy Batch not provide a Step concept like Spring Batch?

Easy Batch does not implement batch jobs as a workflow of steps and this is on purpose. Implementing workflows should be delegated to workflow engines like Easy Flows. Easy Batch was specifically designed for simple ETL jobs. A job in Easy Batch is a single task that has the sole responsibility of reading/processing/writing data from a source to a target.

11. How can I configure Easy Batch logging ?

Easy Batch used to rely on the java.util.logging API to minimise dependencies. As of v5.3, SLF4J is used for logging. So you can use any logging framework compatible with SLF4J.

12. Is there a micro benchmark for Easy Batch jobs?

Talking about "Micro"-benchmarks for batch applications is not correct. Benchmarking batch applications is hard as the majority of real world jobs interact with external resources (databases, file systems, etc).

We used to provide a JMH based benchmark to measure the potential overhead of Easy Batch jobs but it has been removed because we think JMH is not the right tool to benchmark batch jobs. Benchmarking batch applications heavily depends on the use case, so the best way to measure any potential overhead is to give it a try on your use case (and compare it to other frameworks).

13. I have another question, How do I do?

Feel free to ask your question on Gitter.