Skip to content

key apis

Mahmoud Ben Hassine edited this page Feb 7, 2020 · 4 revisions

The Job Domain APIs

Easy Batch is all about processing a data source in batch mode:

  • First, you need to build a Job using a JobBuilder by specifying JobParameters
  • Then, you execute your job using a JobExecutor
  • Finally you get a JobReport of the job run

Each instance of the Job interface represents an execution. Writing data to a data sink is optional.

The Record and Batch APIs

The framework provides the Record and Batch APIs to abstract data format and process records in a consistent way regardless of the data source type. A record can be a line in a flat file, a tag in a XML file, a row in a database table, a file in a folder, etc. Here are some examples of records:

The generic Record interface is an abstraction of all record types:

public interface Record<P> {

    /** Return the record header */
    Header getHeader();

    /** Return the record payload */
    P getPayload();

}

A record has a header and a payload:

  • The header contains various metadata about the record such as the data source from which the record has been read, its physical number, creation date, etc..
  • The payload is the raw content of the record which is generic since its depends on the data source type. The record payload can be of any type, so that it can represent any type of input data.

Records are read in sequence and submitted to a processing pipeline where each record is piped out from one processor to the next one:

Once a batch is complete, records are written in batches to the data sink. A batch of records is represented by the Batch API.

Clone this wiki locally