forked from druid-io/druid-io.github.io
-
Notifications
You must be signed in to change notification settings - Fork 0
/
druid.html
71 lines (52 loc) · 6.19 KB
/
druid.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
---
title: What is Druid?
layout: html_page
sectionid: druid
---
<div class="druid-header">
<div class="container">
<h1>Druid is...</h1>
<h5 class="easter-egg">An Existential Journey</h5>
</div>
</div>
<div class="container">
<div class="row">
<div class="col-md-9">
<h1 id="whatis">What is Druid?</h1>
<hr id="realrealtime">
<h2 id="realtime">Real-time Ingestion and Queries</h2>
<p><strong>Real-time data</strong> Typical analytics databases ingest data via batches. Ingesting an event at a time is often accompanied with transactional locks and other overhead that slows down the ingestion rate. Druid's real-time nodes employ lock-free ingestion of append-only data sets to allow for simultaneous ingestion and querying of 10,000+ events per second. Simply put, the latency between when an event happens and when it is visible is limited only by how quickly the event can be delivered to Druid.</p>
<p><strong>Ad hoc, multi-dimensional filtering.</strong> Druid maintains bitmap indexes compressed using <a href="http://ricerca.mat.uniroma3.it/users/colanton/concise.html">CONCISE</a> to determine what data it has to look at before it ever starts looking at data. This significantly speeds up ad hoc filtered queries, even allowing for fast OR queries which are traditionally slow. All this <a href="http://metamarkets.com/2012/druid-bitmap-compression/">without a significant impact on data footprint</a></p>
<p><strong>Column-oriented for speed.</strong> Data is laid out in columns so that scans are limited to the specific data being searched. <a href="/blog/2011/05/20/druid-part-deux.html">Compression decreases overall data footprint.</a></p>
<p><strong>Power dashboards.</strong> Druid can aggregate billions of events in real-time and in a multi-tenant environment and is often used to power interactive dashboards. Data in Druid is mostly immutable and Druid is designed to support numerous concurrent reads.</p>
<hr id="realrealtime">
<h2 id="scalable">Scalable and Available</h2>
<p><strong>In-memory or on-disk.</strong> Druid leverages the memory mapping capabilities of modern operating systems to allow for only relevant data to be loaded into memory while the rest can live on disk. This means that if your performance requirements dictate that the data must be in memory, then you can configure each node to only accept an amount of data that is equivalent to the available memory and it will all be in-memory. If you are ok with only having the working set in memory, each node can hold more than just the working set on a given machine and the requisite data will be swapped into memory on demand.</p>
<p><strong>Highly Available.</strong> Scaling up or down, replicating nodes, or recovering from failure typically impacts availability and performance. Druid uses a distributed architecture that allows replication at the segment level – relieving the load on “hot segments.” And, because of replication, Druid supports rolling deployments and restarts. Scale up or scale down just by adding or remove nodes, it’s that easy and no data has to be re-processed or re-indexed, just re-replicated.</p>
<hr id="realrealtime">
<h2 id="hri">Built for Analytics</h2>
<p><strong>Approximate and exact calculations.</strong> Druid supports approximate cardinality estimation, and approximate histogram and quantile calculation. Druid also supports a variety of exact aggregations.</p>
<p><strong>Extendable architecture.</strong> Write your own sketch algorithms or aggregators and plug them in directly into Druid.</p>
<hr id="where">
<h1>Where did Druid come from?</h1>
<p>Druid was created out of necessity by Metamarkets, a company focused on providing real-time interactive insight to the RTB (real time bidding) AdTech space with a full stack analytics service. Metamarkets required a system that could ingest data in real-time, provide ad-hoc N-dimensional drill down and still provide sub-second responses. As a hosted service, Metamarkets also required no downtime deployments, fault-tolerance and self-healing properties.</p>
<p>Druid was <a href="/blog/2012/10/24/introducing-druid.html">opened up</a> because Metamarkets is fully committed to the AdTech use case. However, it was felt that Druid had more general applicability to other spaces and it was in Metamarket’s best interests not to limit Druid’s development solely to its own use cases.</p>
<hr id="used">
<h1>What is Druid used for?</h1>
<p>Druid is purpose built infrastructure that provides for exploration of very large quantities of data as it is ingested into the system. It is generally used for dashboarding of event streams. Druid is used for production in various spaces such as ad-tech, telecommunications, social media, and operations. If you have a dataset that is too large for your current infrastructure, your data has a timestamp associated with every event and you want to arbitrarily filter into the data with your queries, then Druid can probably provide value for whatever your use case is as well.</p>
<h2 id="immediate_insight_to_large_quantities_of_data">Immediate insight to large quantities of data:</h2>
<p>The low time latency between when data is ingested into Druid and when that data is reflected in queries allows users understand what is going on “right now” instead of a few minutes ago.</p>
<h2 id="deep_exploratory_drilldown">Deep, exploratory drill-down:</h2>
<p>Users value the ability to create many arbitrary filter dimensions without impacting performance, breaking scalability or cost viability (compute infrastructure). Ad hoc drill down on immediate and historical data allows users to query both “fresh”, or immediately ingested data and historical data all at once.</p>
</div>
<div class="col-md-3">
<div class="druid-sidebar hidden-print hidden-xs hidden-sm">
<ul class="nav">
<li><a href="#realrealtime">What does real-time mean?</a></li>
<li><a href="#where">Where did Druid come from?</a></li>
<li><a href="#used">What is Druid used for?</a></li>
</ul>
</div>
</div>
</div>
</div>