Tiny ETL tool in Go language.
ETL stands for Extract-Transform-Load
The purpose of the project was to implement a test task for a job interview.
The original task: filter list of customers in a range of 100km from a geo point and return id-name pairs sorted by ID (ascending).
- Workflow is defined by a chain of workers.
- Uses streaming processing where possible to minimize memory footprint.
- Data items are processed throw workflow and are wrapped in a
WorkItem
container that holds data item asData() interface{}
. It also has a reference to a previous worker for easier logging/troubleshooting in case of unexpected input. - Asynchronous processing is not built-in at the moment but should be fairly easy to add.
Though current implementation allows workers return
etl.Iterator
and easily use goroutines for async processing. - The library was designed in such a way that it requires minimum boiler-plating & coding from developers.
/examples/customers/main.go
- main entry point for the job interview test./examples/customers/customerscli/etl_workflow.go
- ETL workflow initialization specific for our use case. Here we create fitler & sorter./etl/
- ETL workflow library/etl/workers/
- built-in ETL workflow workers
To run the program you would need Go language installed, preferably version 1.10.
-
change current directory to
examples/customers
cd $GOPATH/src/github.com/astec/tinyetl/examples/customers
-
Get all Go source code dependencies:
go get ./...
-
Build the program using Go compiler
go build .
This should produce
customers
executable file (customers.exe
on Windows). -
To get hints for program arguments run with
--help
flag:>customers.exe --help usage: customers.exe [<flags>] Flags: --help Show context-sensitive help (also try --help-long and --help-man). -i, --input=INPUT Input file or URL -s, --sort=SORT Specifies how to sort customers: id, name. Prepend '-' for descending order.
-
File to be processed can be specified with
--input
or short-i
parameter.customers --input=customers.txt
It defaults to
customers.txt
. -
Sorting can be specified with
--sort
or short-s
parameter.customers --sort=id
Currently 2 options supported:
id
and byname
.If you want to sort in descending order prepend value with "
-
":customers --sort=-id
Tests & unit tests are implemented using standard Go testing convention.
Test files are located next to code files with _test
postfix. E.g.:
code.go
code_test.go
To run all tests:
go test ./...
Some quick links for tests:
- End to end tests: /examples/customers/customerscli/end2end_test.go
- To minimize memory footprint
CustomerExtended
structure is mapped toCustomerShot
before sorting. - ffjson autogenerated code is used to speedup JSON unmarshalling.
- To speedup development process input file name defaults to
customers.txt
if not specified. - The utility should support an
--input
parameter as a comma-separated list of files, though this has not been tested yet.
This is a test demo project. Due to time & purpose constraint it misses few things essential for production use. E.g:
- Logging
- Statistics & Performance counters
- etc.