why gleam

Why

Previously I created https://github.com/chrislusf/glow, a Go based distributed computing system. One thing I like is that it's all pure Go and statically compiled. However, the related problem is that it is a fixed computation flow. There are no easy way to dynamically send a different computation to remote executors.

Gleam is different from glow in that the mappers or reducers are pre-registered, instead of depending on execution order. This gives much more flexibility to dynamically compose the computation flow.

Dynamic Computation

Gleam resolved this issue. The computation flow can be adjusted dynamically.

Distributedly Parallel Unix Pipe Tools

Gleam also support Unix Pipe tool set, such as "grep", "awk", "tr", "sort", "uniq", etc. These are simple, small and super efficient tools that can be combined together to do great things. However, there are no systems that can distributedly run these tools.

Go's concurrent programming support easily enables parallel execution for the schell scripts.

Pipe Execution for any scripting language

Gleam's Pipe() function is basically the same as Unix's pipeline. In additional to all basic unix tools, you can use anything written in Python/Ruby/Shell/Java/C/C++, or mix them together, to do distributed computing.

Compared to Spark

Spark is a popular system. Gleam has more similarity than difference with Spark. However, Gleam has its advantages.

Memory Efficient. One server can host much more executors.
OS managed memory in separated OS process. No more JVM memory tuning.
Fast to setup and run. Gleam agents and Gleam master are very simple and very fast to setup.

Architecture
- Why Gleam?
- Write Mapper Reducer in Go
Setup
Gleam APIs
Data Sources
- Read from HDFS Local S3
- Add New Source
FAQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why gleam

Why

Dynamic Computation

Distributedly Parallel Unix Pipe Tools

Pipe Execution for any scripting language

Compared to Spark

Clone this wiki locally