Skip to content

Commit

Permalink
deploy: d8b2360
Browse files Browse the repository at this point in the history
  • Loading branch information
lynnagara committed Jun 20, 2023
0 parents commit 2778e8d
Show file tree
Hide file tree
Showing 120 changed files with 22,447 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .buildinfo
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
# Sphinx build info version 1
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
config: ee10fcae1221319d5ff054cc0e656345
tags: 645f666f9bcd5a90fca523b33c5a78b7
Binary file added .doctrees/architecture/datamodel.doctree
Binary file not shown.
Binary file added .doctrees/architecture/overview.doctree
Binary file not shown.
Binary file added .doctrees/architecture/queryprocessing.doctree
Binary file not shown.
Binary file added .doctrees/architecture/slicing.doctree
Binary file not shown.
Binary file added .doctrees/clickhouse/death_queries.doctree
Binary file not shown.
Binary file added .doctrees/clickhouse/schema_design.doctree
Binary file not shown.
Binary file added .doctrees/clickhouse/supported_versions.doctree
Binary file not shown.
Binary file added .doctrees/clickhouse/topology.doctree
Binary file not shown.
Binary file added .doctrees/configuration/dataset.doctree
Binary file not shown.
Binary file added .doctrees/configuration/entity.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/configuration/intro.doctree
Binary file not shown.
Binary file added .doctrees/configuration/migration_group.doctree
Binary file not shown.
Binary file added .doctrees/configuration/overview.doctree
Binary file not shown.
Binary file not shown.
Binary file added .doctrees/configuration/writable_storage.doctree
Binary file not shown.
Binary file added .doctrees/contributing/environment.doctree
Binary file not shown.
Binary file added .doctrees/environment.pickle
Binary file not shown.
Binary file added .doctrees/getstarted.doctree
Binary file not shown.
Binary file added .doctrees/index.doctree
Binary file not shown.
Binary file added .doctrees/intro.doctree
Binary file not shown.
Binary file added .doctrees/language/snql.doctree
Binary file not shown.
Binary file added .doctrees/migrations/modes.doctree
Binary file not shown.
Binary file added .doctrees/query/overview.doctree
Binary file not shown.
Empty file added .nojekyll
Empty file.
Binary file added _images/clickhouse_nodes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/compositeprocessing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/datamodel.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/deployment_legend.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/errors_transactions_deployment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/joins.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/multientity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/outcomes_deployment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/queryprocessing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/sessions_deployment.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/singleentity.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added _images/snubaUI.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
177 changes: 177 additions & 0 deletions _sources/architecture/datamodel.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
================
Snuba Data Model
================

This section explains how data is organized in Snuba and how user facing
data is mapped to the underlying database (Clickhouse in this case).

The Snuba data model is divided horizontally into a **logical model** and
a **physical model**. The logical data model is what is visible to the Snuba
clients through the Snuba query language. Elements in this model may or may
not map 1:1 to tables in the database. The physical model, instead, maps 1:1
to database concepts (like tables and views).

The reasoning behind this division is that it allows Snuba to expose a
stable interface through the logical data model and perform complex mapping
internally to execute a query on different tables (part of the physical
model) to improve performance in a way that is transparent to the client.

The rest of this section outlines the concepts that compose the two models
and how they are connected to each other.

The main concepts, described below are dataset, entity and storage.

.. image:: /_static/architecture/datamodel.png

Datasets
========

A Dataset is a name space over Snuba data. It provides its own schema and
it is independent from other datasets both in terms of logical model and
physical model.

Examples of datasets are, discover, outcomes, sessions. There is no
relationship between them.

A Dataset can be seen as a container for the components that define its
abstract data model and its concrete data model that are described below.

In term of query language, every Snuba query targets one and only one
Dataset, and the Dataset can provide extensions to the query language.

Entities and Entity Types
=========================

The fundamental block of the logical data model Snuba exposes to the client
is the Entity. In the logical model an entity represents an instance of an
abstract concept (like a transaction or an error). In practice an *Entity*
corresponds to a row in a table in the database. The *Entity Type* is the
class of the Entity (like Error**s** or Transaction**s**).

The logical data model is composed by a set of *Entity Types* and by their
relationships.

Each *Entity Type* has a schema which is defined by a list of fields with
their associated abstract data types. The schemas of all the *Entity Types*
of a Dataset (there can be several) compose the logical data model that is
visible to the Snuba client and against which Snuba Queries are validated.
No lower level concept is supposed to be exposed.

Entity Types are unequivocally contained in a Dataset. An Entity Type cannot
be present in multiple Datasets.

Relationships between Entity Types
----------------------------------

Entity Types in a Dataset are logically related. There are two types of
relationships we support:

- Entity Set Relationship. This mimics foreign keys. This relationship is
meant to allow joins between Entity Types. It only supports one-to-one
and one-to-many relationships at this point in time.
- Inheritance Relationship. This mimics nominal subtyping. A group of Entity
Types can share a parent Entity Type. Subtypes inherit the schema from the
parent type. Semantically the parent Entity Type must represent the union
of all the Entities whose type inherit from it. It also must be possible
to query the parent Entity Type. This cannot be just a logical relationship.

Entity Type and consistency
---------------------------

The Entity Type is the largest unit where Snuba **can** provide some strong
data consistency guarantees. Specifically it is possible to query an Entity
Type expecting Serializable Consistency (please don't use that. Seriously,
if you think you need that, you probably don't). This does not extend to
any query that spans multiple Entity Types where, at best, we will have
eventual consistency.

This also has an impact on Subscription queries. These can only work on one
Entity Type at a time since, otherwise, they would require consistency between
Entity Types, which we do not support.

.. ATTENTION::
To be precise the unit of consistency (depending on the Entity Type)
can be even smaller and depend on how the data ingestion topics
are partitioned (project_id for example), the Entity Type is the
maximum Snuba allows. More details are (ok, will be) provided in
the Ingestion section of this guide.

Storage
=======

Storages represent and define the physical data model of a Dataset. Each
Storage represent is materialized in a physical database concept like a table
or a materialized view. As a consequence each Storage has a schema defined
by fields with their types that reflects the physical schema of the DB
table/view the Storage maps to and it is able to provide all the details to
generate DDL statements to build the tables on the database.

Storages are able to map the logical concepts in the logical model discussed
above to the physical concept of the database, thus each Storage needs to be
related with an Entity Type. Specifically:

- Each Entity Type must be backed by least one Readable Storage (a Storage we
can run query on), but can be backed by multiple Storages (for example a
pre-aggregate materialized view). Multiple Storages per Entity Type are meant
to allow query optimizations.
- Each Entity Type must be backed by one and only one Writable
Storage that is used to ingest data and fill in the database tables.
- Each Storage is backing exclusively one Entity Type.



Examples
========

This section provides some examples of how the Snuba data model can represent
some real world models.

These case studies are not necessarily reflecting the current Sentry production
model nor they are part of the same deployment. They have to be considered as
examples taken in isolation.

Single Entity Dataset
---------------------

This looks like the Outcomes dataset used by Sentry. This actually does not
reflect Outcomes as of April 2020. It is though the design Outcomes should
move towards.

.. image:: /_static/architecture/singleentity.png

This Dataset has one Entity Type only which represent an individual Outcome
ingested by the Dataset. Querying raw Outcomes is painfully slow so we have
two Storages. One is the Raw storage that reflects the data we ingest and a
materialized view that computes hourly aggregations that are much more efficient
to query. The Query Planner would pick the storage depending if the query
can be executed on the aggregated data or not.

Multi Entity Type Dataset
-------------------------

The canonical example of this Dataset is the Discover dataset.

.. image:: /_static/architecture/multientity.png

This has three Entity Types. Errors, Transaction and they both inherit from
Events. These form the logical data model, thus querying the Events Entity
Type gives the union of Transactions and Errors but it only allows common
fields between the two to be present in the query.

The Errors Entity Type is backed by two Storages for performance reasons.
One is the main Errors Storage that is used to ingest data, the other is a
read only view that is putting less load on Clickhosue when querying but
that offers lower consistency guarantees. Transactions only have one storage
and there is a Merge Table to serve Events (which is essentially a view over
the union of the two tables).

Joining Entity types
--------------------

This is a simple example of a dataset that includes multiple Entity Types
that can be joined together in a query.

.. image:: /_static/architecture/joins.png

GroupedMessage and GroupAssingee can be part of a left join query with Errors.
The rest is similar with what was discussed in the previous examples.
156 changes: 156 additions & 0 deletions _sources/architecture/overview.rst.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
===========================
Snuba Architecture Overview
===========================

Snuba is a time series oriented data store backed by
`Clickhouse <https://clickhouse.tech/>`_, which is a columnary storage
distributed database well suited for the kind of queries Snuba serves.

Data is fully stored in Clickhouse tables and materialized views,
it is ingested through input streams (only Kafka topics today)
and can be queried either through point in time queries or through
streaming queries (subscriptions).

.. image:: /_static/architecture/overview.png

Storage
=======

Clickhouse was chosen as backing storage because it provides a good balance
of the real time performance Snuba needs, its distributed and replicated
nature, its flexibility in terms of storage engines and consistency guarantees.

Snuba data is stored in Clickhouse tables and Clickhouse materialized views.
Multiple Clickhouse `storage engines <https://clickhouse.tech/docs/en/engines/table-engines/>`_
are used depending on the goal of the table.

Snuba data is organized in multiple Datasets which represent independent
partitions of the data model. More details in the :doc:`/architecture/datamodel`
section.

Ingestion
=========

Snuba does not provide an api endpoint to insert rows (except when running
in debug mode). Data is loaded from multiple input streams, processed by
a series of consumers and written to Clickhouse tables.

A consumer consumes one or multiple topics and writes on one or multiple
tables. No table is written onto by multiple consumers as of today. This
allows some consistency guarantees discussed below.

Data ingestion is most effective in batches (both for Kafka but especially
for Clickhouse). Our consumers support batching and guarantee that one batch
of events taken from Kafka is passed to Clickhouse at least once. By properly
selecting the Clickhouse table engine to deduplicate rows we can achieve
exactly once semantics if we accept eventual consistency.

Query
=====

The simplest query system is point in time. Queries are expressed in a
the SnQL language (:doc:`/language/snql`) and are sent as post HTTP calls.
The query engine processes the query (process described in
:doc:`/architecture/queryprocessing`) and transforms it into a ClickHouse
query.

Streaming queries (done through the Subscription Engine) allow the client
to receive query results in a push way. In this case an HTTP endpoint allows
the client to register a streaming query. Then The Subscription Consumer consumes
to the topic that is used to fill the relevant Clickhouse table for updates,
periodically runs the query through the Query Engine and produces the result
on the subscriptions Kafka topic.

Data Consistency
================

Different consistency models coexist in Snuba to provide different guarantees.

By default Snuba is eventually consistent. When running a query, by default,
there is no guarantee of monotonic reads since Clickhouse is multi-leader
and a query can hit any replica and there is no guarantee the replicas will
be up to date. Also, by default, there is no guarantee Clickhouse will have
reached a consistent state on its own.

It is possible to achieve strong consistency on specific query by forcing
Clickhouse to reach consistency before the query is executed (FINAL keyword),
and by forcing queries to hit the specific replica the consumer writes onto.
This essentially uses Clickhouse as if it was a single leader system and it
allows Sequential consistency.

================================
Snuba within a Sentry Deployment
================================

This sections explains the role Snuba plays within a Sentry deployment showing
the main data flows. If you are deploying Snuba stand alone, this won't be
useful for you.

Legend:

.. image:: /_static/architecture/deployment_legend.png

Deployments:

Errors and transaction:

.. image:: /_static/architecture/errors_transactions_deployment.png


Sessions:

.. image:: /_static/architecture/sessions_deployment.png

Outcomes:

.. image:: /_static/architecture/outcomes_deployment.png

Errors and Transactions data flow
=================================

The main section at the top of the diagram illustrates the ingestion process
for the ``Events`` and ``Transactions`` Entities. These two entities serve
most issue/errors related features in Sentry and the whole Performance
product.

There is only one Kafka topic (``events``) shared between errors and transactions
that feeds this pipeline. This topic contains both error messages and transaction
messages.

The Errors consumers consumes the ``events`` topic, writes messages in the Clickhouse
``errors`` table. Upon commit it also produces a record on the ``snuba-commit-log``
topic.

Alerts on Errors are generated by the Errors Subscription Consumer. This is synchronized
consumer that consumes both the main ``events`` topic and the ``snuba-commit-log`` topic
so it can proceed in lockstep with the main consumer.

The synchronized consumer then produces alerts by querying Clickhouse and produces
the result on the result topic.

An identical but independent pipeline exists for transactions.

The Errors pipeline has an additional step: writing to the ``replacements`` topic.
Errors mutations (merge/unmerge/reprocessing/etc.) are produced by Sentry on the
``events`` topic. They are then forwarded to the ``replacements`` topic by the
Errors Consumer and executed by the Replacement Consumer.

The ``events`` topic must be partitioned semantically by Sentry project id to
allow in order processing of the events within a project. This, as of today, is a
requirement for alerts and replacements.

Sessions and Outcomes
=====================

``Sessions`` and ``Outcomes`` work in very similar and simpler way. Specifically
``Sessions`` power Release Health features, while ``Outcomes`` mainly provide
data to the Sentry ``stats`` page.

Both pipelines have their own Kafka topic, Kafka consumer and they write on their
own table in Clickhouse.

Change Data Capture pipeline
============================

This pipeline is still under construction. It consumes the ``cdc`` topic and fills
two independent tables in Clickhouse.
Loading

0 comments on commit 2778e8d

Please sign in to comment.