-
Notifications
You must be signed in to change notification settings - Fork 290
why gleam
Previously I created https://github.com/chrislusf/glow, a Go based distributed computing system. One thing I like is that it's all pure Go and statically compiled. However, the related problem is that it is a fixed computation flow. There are no easy way to dynamically send a different computation to remote executors.
Gleam is different from glow in that the mappers or reducers are pre-registered, instead of depending on execution order. This gives much more flexibility to dynamically compose the computation flow.
Gleam resolved this issue. The computation flow can be adjusted dynamically.
Gleam also support Unix Pipe tool set, such as "grep", "awk", "tr", "sort", "uniq", etc. These are simple, small and super efficient tools that can be combined together to do great things. However, there are no systems that can distributedly run these tools.
Go's concurrent programming support easily enables parallel execution for the schell scripts.
Gleam's Pipe() function is basically the same as Unix's pipeline. In additional to all basic unix tools, you can use anything written in Python/Ruby/Shell/Java/C/C++, or mix them together, to do distributed computing.
Spark is a popular system. Gleam has more similarity than difference with Spark. However, Gleam has its advantages.
- Memory Efficient. One server can host much more executors.
- OS managed memory in separated OS process. No more JVM memory tuning.
- Fast to setup and run. Gleam agents and Gleam master are very simple and very fast to setup.