-
Notifications
You must be signed in to change notification settings - Fork 199
key apis
Easy Batch is all about processing a data source in batch mode:
- First, you need to build a
Job
using aJobBuilder
by specifyingJobParameters
- Then, you execute your job using a
JobExecutor
- Finally you get a
JobReport
of the job run
Each instance of the Job
interface represents an execution.
Writing data to a data sink is optional.
The framework provides the Record
and Batch
APIs to abstract data format and process records in a consistent way regardless of the data source type. A record can be a line in a flat file, a tag in a XML file, a row in a database table, a file in a folder, etc. Here are some examples of records:
The generic Record
interface is an abstraction of all record types:
public interface Record<P> {
/** Return the record header */
Header getHeader();
/** Return the record payload */
P getPayload();
}
A record has a header and a payload:
- The header contains various metadata about the record such as the data source from which the record has been read, its physical number, creation date, etc..
- The payload is the raw content of the record which is generic since its depends on the data source type. The record payload can be of any type, so that it can represent any type of input data.
Records are read in sequence and submitted to a processing pipeline where each record is piped out from one processor to the next one:
Once a batch is complete, records are written in batches to the data sink. A batch of records is represented by the Batch
API.
Easy Batch is created by Mahmoud Ben Hassine with the help of some awesome contributors
-
Introduction
-
User guide
-
Job reference
-
Component reference
-
Get involved