layout | title | tagline |
---|---|---|
home |
Lambda Architecture |
A repository dedicated to the Lambda Architecture (LA). We collect and publish examples and good practices around the LA. |
{% include JB/setup %}
-
{% for post in site.posts %}
- {{ post.date | date_to_string }} » {{ post.title }} by {{ post.author }} {% endfor %}
Nathan Marz came up with the term Lambda Architecture (LA) for a generic, scalable and fault-tolerant data processing architecture, based on his experience working on distributed data processing systems at Backtype and Twitter.
The LA aims to satisfy the needs for a robust system that is fault-tolerant, both against hardware failures and human mistakes, being able to serve a wide range of workloads and use cases, and in which low-latency reads and updates are required. The resulting system should be linearly scalable, and it should scale out rather than up.
Here's how it looks like, from a high-level perspective:
- All data entering the system is dispatched to both the batch layer and the speed layer for processing.
- The batch layer has two functions: (i) managing the master dataset (an immutable, append-only set of raw data), and (ii) to pre-compute the batch views.
- The serving layer indexes the batch views so that they can be queried in low-latency, ad-hoc way.
- The speed layer compensates for the high latency of updates to the serving layer and deals with recent data only.
- Any incoming query can be answered by merging results from batch views and real-time views.
- Big Data, book by Nathan Marz and James Warren
- Applying the Big Data Lambda Architecture, Dr. Dobb's article by Michael Hausenblas
- The Lambda architecture: principles for architecting realtime Big Data systems, blog post by James Kinley
- Lambda Architecture: Achieving Velocity and Volume with Big Data, article by Christian Prokopp
- Lambda Architecture with Apache Spark by Michael Hausenblas
See the about us section for details.